Resume parsing with Machine learning - NLP with Python OCR and Spacy

Resume parsing with Machine learning - NLP with Python OCR and Spacy

In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.
explanation image


Each project comes with 2-5 hours of micro-videos explaining the solution.

ipython image

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

project experience

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews
profile image

SUBHABRATA BISWAS linkedin profile url

Lead Consultant, ITC Infotech

The project orientation is very much unique and it helps to understand the real time scenarios most of the industries are dealing with. And there is no limit, one can go through as many projects... Read More

profile image

Mohamed Yusef Ahmed linkedin profile url

Software Developer at Taske

Recently I became interested in Hadoop as I think its a great platform for storing and analyzing large structured and unstructured data sets. The experts did a great job not only explaining the... Read More

What will you learn

Understanding the Problem Statement
Natural Language Processing
Generic Machine learning framework
Understanding OCR
Natural Entity Recognition
Converting JSON to Spacy Format
Spacy NER
Understanding Annotations & Entities in Spacy
Spacy Custom Model Building
Understanding Parameters behind Spacy Model
Extracting text from PDF
Incremental Spacy Model Building
Understanding TIKA OCR process
Interpreting the results
Extracting entities out of new resumes

Project Description

Recruiters and HR teams in companies have a tough time scanning thousands of qualified resumes. Either they need many people to do this or they miss out on qualified candidates. This is a waste of time, money and productivity for the company.

To solve this, our resume parser application can take in millions of resumes, parse the needed fields and categorise them. This resume parser uses the popular python library - Spacy for OCR and text classifications. First we train our model with these fields, then the application can pick out the values of these fields from new resumes being input.

The dataset of resumes has the following fields:

  • Location
  • Designation
  • Name
  • Years of Experience
  • College
  • Degree
  • Graduation Year
  • Companies worked at
  • Email address

Similar Projects

This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Use the RACE dataset to extract a dominant topic from each document and perform LDA topic modeling in python.

In this human activity recognition project, we use multiclass classification machine learning techniques to analyse fitness dataset from a smartphone tracker.

Curriculum For This Mini Project

Python Package installer - pip requirements
Jupyter vs Microsoft Visual Studio
Introduction to the Resume Parsing Problem Statement
Data Sourcing Format
Understanding Natural Entity Recognition
Spacy Ner
Spacy Data Input
Data Format
Metrics Solution Approach
Machine Leaning Framework To Organise Your Project
Converting Data To Spacy Format
Model Check Data
Spacy Model Part 1
Spacy Model Part 2
Running Engine File
Summary Predictions