What is Natural entitiy recognition in nlp

This recipe explains what is Natural entitiy recognition in nlp

Recipe Objective

What is Natural entitiy recognition?

Natural entity recognition (NER) is an keyword extraction technique that uses Natural language processing to automatically identify named entities from a chunk of text or larger text and classify them according to the predetermined categories for e.g People, organization, email address, location, values etc. lets understand this with an example:

NLP Techniques to Learn for your Next NLP Project

Jon is from canada he works at Apple.

So in the above the highlighted words are from some categories which are as follows:

Name - Jon

location - canada

Organization - Apple.

Some of the practical applications of NER are:

Scanning news articles for the people, organizations and locations reported.

Quickly retrieving geographical locations talked about in Twitter posts.

In Human Resources it will speed up the hiring process by summarizing applicants’ CVs; improve internal workflows by categorizing employee complaints and questions

In Customer Supprt it will improve response times by categorizing user requests, complaints and questions and filtering by priority keywords. And Many more..

Step 1 - Import the necessary libraries

import nltk from nltk.tokenize import word_tokenize from nltk.tag import pos_tag

Step 2 - Take a sample text

My_text = '''Thomas Alva Edison (February 11, 1847 – October 18, 1931) was an American inventor and businessman who has been described as America's greatest inventor.[1][2][3] He developed many devices in fields such as electric power generation, mass communication, sound recording, and motion pictures.[4] These inventions, which include the phonograph, the motion picture camera, and early versions of the electric light bulb, have had a widespread imp

We have taken a sample paragraph of Thomas Elva Edison from wikipedia for our reference

Step 3 - Tokenize the sentence in words by using word_tokenizer

tokenized_text = nltk.word_tokenize(My_text) print(tokenized_text)

['Thomas', 'Alva', 'Edison', '(', 'February', '11', ',', '1847', '–', 'October', '18', ',', '1931', ')', 'was', 'an', 'American', 'inventor', 'and', 'businessman', 'who', 'has', 'been', 'described', 'as', 'America', "'s", 'greatest', 'inventor', '.', '[', '1', ']', '[', '2', ']', '[', '3', ']', 'He', 'developed', 'many', 'devices', 'in', 'fields', 'such', 'as', 'electric', 'power', 'generation', ',', 'mass', 'communication', ',', 'sound', 'recording', ',', 'and', 'motion', 'pictures', '.', '[', '4', ']', 'These', 'inventions', ',', 'which', 'include', 'the', 'phonograph', ',', 'the', 'motion', 'picture', 'camera', ',', 'and', 'early', 'versions', 'of', 'the', 'electric', 'light', 'bulb', ',', 'have', 'had', 'a', 'widespread', 'impact', 'on', 'the', 'modern', 'industrialized', 'world', '.', '[', '5', ']', 'He', 'was', 'one', 'of', 'the', 'first', 'inventors', 'to', 'a', 'pply', 'the', 'principles', 'of', 'organized', 'science', 'and', 'teamwork', 'to', 'the', 'process', 'of', 'invention', ',', 'working', 'with', 'many', 'researchers', 'and', 'employees', '.', 'He', 'established', 'the', 'first', 'industrial', 'research', 'laboratory', '.', '[', '6', ']']

From the above we can see that the sentence has been tokenized into words

Step 4 - Apply part-of-speech (POS) tagging to the tokenized text

tagged_text = nltk.pos_tag(tokenized_text) print(tagged_text)

[('Thomas', 'NNP'), ('Alva', 'NNP'), ('Edison', 'NNP'), ('(', '('), ('February', 'NNP'), ('11', 'CD'), (',', ','), ('1847', 'CD'), ('–', 'NNP'), ('October', 'NNP'), ('18', 'CD'), (',', ','), ('1931', 'CD'), (')', ')'), ('was', 'VBD'), ('an', 'DT'), ('American', 'JJ'), ('inventor', 'NN'), ('and', 'CC'), ('businessman', 'NN'), ('who', 'WP'), ('has', 'VBZ'), ('been', 'VBN'), ('described', 'VBN'), ('as', 'IN'), ('America', 'NNP'), ("'s", 'POS'), ('greatest', 'JJS'), ('inventor', 'NN'), ('.', '.'), ('[', 'CC'), ('1', 'CD'), (']', 'JJ'), ('[', '), ('2', 'CD'), (']', 'NNP'), ('[', 'VBD'), ('3', 'CD'), (']', 'NN'), ('He', 'PRP'), ('developed', 'VBD'), ('many', 'JJ'), ('devices', 'NNS'), ('in', 'IN'), ('fields', 'NNS'), ('such', 'JJ'), ('as', 'IN'), ('electric', 'JJ'), ('power', 'NN'), ('generation', 'NN'), (',', ','), ('mass', 'NN'), ('communication', 'NN'), (',', ','), ('sound', 'NN'), ('recording', 'NN'), (',', ','), ('and', 'CC'), ('motion', 'NN'), ('pictures', 'NNS'), ('.', '.'), ('[', '), ('4', 'CD'), (']', 'NNP'), ('These', 'DT'), ('inventions', 'NNS'), (',', ','), ('which', 'WDT'), ('include', 'VBP'), ('the', 'DT'), ('phonograph', 'NN'), (',', ','), ('the', 'DT'), ('motion', 'NN'), ('picture', 'NN'), ('camera', 'NN'), (',', ','), ('and', 'CC'), ('early', 'JJ'), ('versions', 'NNS'), ('of', 'IN'), ('the', 'DT'), ('electric', 'JJ'), ('light', 'NN'), ('bulb', 'NN'), (',', ','), ('have', 'VBP'), ('had', 'VBN'), ('a', 'DT'), ('widespread', 'JJ'), ('impact', 'NN'), ('on', 'IN'), ('the', 'DT'), ('modern', 'JJ'), ('industrialized', 'VBN'), ('world', 'NN'), ('.', '.'), ('[', 'CC'), ('5', 'CD'), (']', 'NN'), ('He', 'PRP'), ('was', 'VBD'), ('one', 'CD'), ('of', 'IN'), ('the', 'DT'), ('first', 'JJ'), ('inventors', 'NNS'), ('to', 'TO'), ('a', 'DT'), ('pply', 'NN'), ('the', 'DT'), ('principles', 'NNS'), ('of', 'IN'), ('organized', 'VBN'), ('science', 'NN'), ('and', 'CC'), ('teamwork', 'NN'), ('to', 'TO'), ('the', 'DT'), ('process', 'NN'), ('of', 'IN'), ('invention', 'NN'), (',', ','), ('working', 'VBG'), ('with', 'IN'), ('many', 'JJ'), ('researchers', 'NNS'), ('and', 'CC'), ('employees', 'NNS'), ('.', '.'), ('He', 'PRP'), ('established', 'VBD'), ('the', 'DT'), ('first', 'JJ'), ('industrial', 'JJ'), ('research', 'NN'), ('laboratory', 'NN'), ('.', '.'), ('[', 'CC'), ('6', 'CD'), (']', 'NN')]

Step 5 - Pass the tagged text to a entity chunk function

print(nltk.ne_chunk(tagged_text))

(S
  (PERSON Thomas/NNP)
  (ORGANIZATION Alva/NNP Edison/NNP)
  (/(
  February/NNP
  11/CD
  ,/,
  1847/CD
  –/NNP
  October/NNP
  18/CD
  ,/,
  1931/CD
  )/)
  was/VBD
  an/DT
  (GPE American/JJ)
  inventor/NN
  and/CC
  businessman/NN
  who/WP
  has/VBZ
  been/VBN
  described/VBN
  as/IN
  (GPE America/NNP)
  's/POS
  greatest/JJS
  inventor/NN
  ./.
  [/CC
  1/CD
  ]/JJ
  [/$
  2/CD
  ]/NNP
  [/VBD
  3/CD
  ]/NN
  He/PRP
  developed/VBD
  many/JJ
  devices/NNS
  in/IN
  fields/NNS
  such/JJ
  as/IN
  electric/JJ
  power/NN
  generation/NN
  ,/,
  mass/NN
  communication/NN
  ,/,
  sound/NN
  recording/NN
  ,/,
  and/CC
  motion/NN
  pictures/NNS
  ./.
  [/$
  4/CD
  ]/NNP
  These/DT
  inventions/NNS
  ,/,
  which/WDT
  include/VBP
  the/DT
  phonograph/NN
  ,/,
  the/DT
  motion/NN
  picture/NN
  camera/NN
  ,/,
  and/CC
  early/JJ
  versions/NNS
  of/IN
  the/DT
  electric/JJ
  light/NN
  bulb/NN
  ,/,
  have/VBP
  had/VBN
  a/DT
  widespread/JJ
  impact/NN
  on/IN
  the/DT
  modern/JJ
  industrialized/VBN
  world/NN
  ./.
  [/CC
  5/CD
  ]/NN
  He/PRP
  was/VBD
  one/CD
  of/IN
  the/DT
  first/JJ
  inventors/NNS
  to/TO
  a/DT
  pply/NN
  the/DT
  principles/NNS
  of/IN
  organized/VBN
  science/NN
  and/CC
  teamwork/NN
  to/TO
  the/DT
  process/NN
  of/IN
  invention/NN
  ,/,
  working/VBG
  with/IN
  many/JJ
  researchers/NNS
  and/CC
  employees/NNS
  ./.
  He/PRP
  established/VBD
  the/DT
  first/JJ
  industrial/JJ
  research/NN
  laboratory/NN
  ./.
  [/CC
  6/CD
  ]/NN)

So here we have passed the tagged text to a entity chunk function (named entity chunk) which will return the text as a tree

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

CycleGAN Implementation for Image-To-Image Translation
In this GAN Deep Learning Project, you will learn how to build an image to image translation model in PyTorch with Cycle GAN.

Build CI/CD Pipeline for Machine Learning Projects using Jenkins
In this project, you will learn how to create a CI/CD pipeline for a search engine application using Jenkins.

Digit Recognition using CNN for MNIST Dataset in Python
In this deep learning project, you will build a convolutional neural network using MNIST dataset for handwritten digit recognition.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

NLP Project for Beginners on Text Processing and Classification
This Project Explains the Basic Text Preprocessing and How to Build a Classification Model in Python

Build Deep Autoencoders Model for Anomaly Detection in Python
In this deep learning project , you will build and deploy a deep autoencoders model using Flask.

Build a Churn Prediction Model using Ensemble Learning
Learn how to build ensemble machine learning models like Random Forest, Adaboost, and Gradient Boosting for Customer Churn Prediction using Python

Expedia Hotel Recommendations Data Science Project
In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

Llama2 Project for MetaData Generation using FAISS and RAGs
In this LLM Llama2 Project, you will automate metadata generation using Llama2, RAGs, and AWS to reduce manual efforts.