Resume parsing with Machine learning - NLP with Python OCR and Spacy

Resume parsing with Machine learning - NLP with Python OCR and Spacy

In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

James Peebles

Data Analytics Leader, IQVIA

This is one of the best of investments you can make with regards to career progression and growth in technological knowledge. I was pointed in this direction by a mentor in the IT world who I highly... Read More

Camille St. Omer

Artificial Intelligence Researcher, Quora 'Most Viewed Writer in 'Data Mining'

I came to the platform with no experience and now I am knowledgeable in Machine Learning with Python. No easy thing I must say, the sessions are challenging and go to the depths. I looked at graduate... Read More

What will you learn

Understanding the Problem Statement
Natural Language Processing
Generic Machine learning framework
Understanding OCR
Natural Entity Recognition
Converting JSON to Spacy Format
Spacy NER
Understanding Annotations & Entities in Spacy
Spacy Custom Model Building
Understanding Parameters behind Spacy Model
Extracting text from PDF
Incremental Spacy Model Building
Understanding TIKA OCR process
Interpreting the results
Extracting entities out of new resumes

Project Description

Recruiters and HR teams in companies have a tough time scanning thousands of qualified resumes. Either they need many people to do this or they miss out on qualified candidates. This is a waste of time, money and productivity for the company.

To solve this, our resume parser application can take in millions of resumes, parse the needed fields and categorise them. This resume parser uses the popular python library - Spacy for OCR and text classifications. First we train our model with these fields, then the application can pick out the values of these fields from new resumes being input.

The dataset of resumes has the following fields:

  • Location
  • Designation
  • Name
  • Years of Experience
  • College
  • Degree
  • Graduation Year
  • Companies worked at
  • Email address

Similar Projects

Build a machine learning model that will predict which jobs users will apply to given their past applications, demographics and work history.

Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Curriculum For This Mini Project

Python Package installer - pip requirements
Jupyter vs Microsoft Visual Studio
Introduction to the Resume Parsing Problem Statement
Data Sourcing Format
Understanding Natural Entity Recognition
Spacy Ner
Spacy Data Input
Data Format
Metrics Solution Approach
Machine Leaning Framework To Organise Your Project
Converting Data To Spacy Format
Model Check Data
Spacy Model Part 1
Spacy Model Part 2
Running Engine File
Summary Predictions