Resume parsing with Machine learning - NLP with Python OCR and Spacy

Resume parsing with Machine learning - NLP with Python OCR and Spacy

In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

James Peebles

Data Analytics Leader, IQVIA

This is one of the best of investments you can make with regards to career progression and growth in technological knowledge. I was pointed in this direction by a mentor in the IT world who I highly... Read More

Arvind Sodhi

VP - Data Architect, CDO at Deutsche Bank

I have extensive experience in data management and data processing. Over the past few years I saw the data management technology transition into the Big Data ecosystem and I needed to follow suit. I... Read More

What will you learn

Understanding the Problem Statement
Natural Language Processing
Generic Machine learning framework
Understanding OCR
Natural Entity Recognition
Converting JSON to Spacy Format
Spacy NER
Understanding Annotations & Entities in Spacy
Spacy Custom Model Building
Understanding Parameters behind Spacy Model
Extracting text from PDF
Incremental Spacy Model Building
Understanding TIKA OCR process
Interpreting the results
Extracting entities out of new resumes

Project Description

Recruiters and HR teams in companies have a tough time scanning thousands of qualified resumes. Either they need many people to do this or they miss out on qualified candidates. This is a waste of time, money and productivity for the company.

To solve this, our resume parser application can take in millions of resumes, parse the needed fields and categorise them. This resume parser uses the popular python library - Spacy for OCR and text classifications. First we train our model with these fields, then the application can pick out the values of these fields from new resumes being input.

The dataset of resumes has the following fields:

  • Location
  • Designation
  • Name
  • Years of Experience
  • College
  • Degree
  • Graduation Year
  • Companies worked at
  • Email address

Similar Projects

The goal of this NLP project is to predict which of the provided quora question pairs contain two questions with the same meaning.

In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Curriculum For This Mini Project

Python Package installer - pip requirements
Jupyter vs Microsoft Visual Studio
Introduction to the Resume Parsing Problem Statement
Data Sourcing Format
Understanding Natural Entity Recognition
Spacy Ner
Spacy Data Input
Data Format
Metrics Solution Approach
Machine Leaning Framework To Organise Your Project
Converting Data To Spacy Format
Model Check Data
Spacy Model Part 1
Spacy Model Part 2
Running Engine File
Summary Predictions