Resume parsing with Machine learning - NLP with Python OCR and Spacy

Resume parsing with Machine learning - NLP with Python OCR and Spacy

In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Arvind Sodhi

VP - Data Architect, CDO at Deutsche Bank

I have extensive experience in data management and data processing. Over the past few years I saw the data management technology transition into the Big Data ecosystem and I needed to follow suit. I... Read More

Ray Han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

What will you learn

Understanding the Problem Statement
Natural Language Processing
Generic Machine learning framework
Understanding OCR
Natural Entity Recognition
Converting JSON to Spacy Format
Spacy NER
Understanding Annotations & Entities in Spacy
Spacy Custom Model Building
Understanding Parameters behind Spacy Model
Extracting text from PDF
Incremental Spacy Model Building
Understanding TIKA OCR process
Interpreting the results
Extracting entities out of new resumes

Project Description

Recruiters and HR teams in companies have a tough time scanning thousands of qualified resumes. Either they need many people to do this or they miss out on qualified candidates. This is a waste of time, money and productivity for the company.

To solve this, our resume parser application can take in millions of resumes, parse the needed fields and categorise them. This resume parser uses the popular python library - Spacy for OCR and text classifications. First we train our model with these fields, then the application can pick out the values of these fields from new resumes being input.

The dataset of resumes has the following fields:

  • Location
  • Designation
  • Name
  • Years of Experience
  • College
  • Degree
  • Graduation Year
  • Companies worked at
  • Email address

Similar Projects

This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

In this data science project, we will predict internal failures of Bosch using thousands of measurements and tests made for each component along the assembly line.

In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

Curriculum For This Mini Project

Python Package installer - pip requirements
Jupyter vs Microsoft Visual Studio
Introduction to the Resume Parsing Problem Statement
Data Sourcing Format
Understanding Natural Entity Recognition
Spacy Ner
Spacy Data Input
Data Format
Metrics Solution Approach
Machine Leaning Framework To Organise Your Project
Converting Data To Spacy Format
Model Check Data
Spacy Model Part 1
Spacy Model Part 2
Running Engine File
Summary Predictions