How to perform chunking on a paragraph in nlp

This recipe helps you perform chunking on a paragraph in nlp

Recipe Objective

How to perform chunking on a paragraph? Chunking It follows the part of speech (POS) tagging to add more structure to the sentence which is also known as shallow parsing. The resulted words or groups of words are called chunks. These chunks are made up of words, One can even define a pattern or words that can't be a part of the chuck, and such words are known as chinks. The primary work of chunking is to make a group of "noun phrases". The part of speech is combined with regular expression. For e.g if we want to tag Noun, verb, adjective, and coordinating junction from the sentence then we can use the following: chunk : {<NN.?>*<VBD.?>*<JJ.?>*?} We can combine them according to needs and requirements as there are no predefined rules.

Step 1 - Import the necessary libraries

from nltk import pos_tag from nltk import RegexpParser

Step 2 - Take a sample text and split it

Sample_text = '''Albert Einstein was a German-born theoretical physicist who developed the theory of relativity, one of the two pillars of modern physics. His work is also known for its influence on the philosophy of science.''' Sample_text = Sample_text.split() print(Sample_text)

['Albert', 'Einstein', 'was', 'a', 'German-born', 'theoretical', 'physicist', 'who', 'developed', 'the', 'theory', 'of', 'relativity,', 'one', 'of', 'the', 'two', 'pillars', 'of', 'modern', 'physics.', 'His', 'work', 'is', 'also', 'known', 'for', 'its', 'influence', 'on', 'the', 'philosophy', 'of', 'science.']

Step 3 - Apply POS tagging

tagging = pos_tag(Sample_text) print(tagging)

[('Albert', 'NNP'), ('Einstein', 'NNP'), ('was', 'VBD'), ('a', 'DT'), ('German-born', 'JJ'), ('theoretical', 'JJ'), ('physicist', 'NN'), ('who', 'WP'), ('developed', 'VBD'), ('the', 'DT'), ('theory', 'NN'), ('of', 'IN'), ('relativity,', 'JJ'), ('one', 'CD'), ('of', 'IN'), ('the', 'DT'), ('two', 'CD'), ('pillars', 'NNS'), ('of', 'IN'), ('modern', 'JJ'), ('physics.', 'FW'), ('His', 'PRP), ('work', 'NN'), ('is', 'VBZ'), ('also', 'RB'), ('known', 'VBN'), ('for', 'IN'), ('its', 'PRP), ('influence', 'NN'), ('on', 'IN'), ('the', 'DT'), ('philosophy', 'NN'), ('of', 'IN'), ('science.', 'NN')]

Step 4 - Define the chunk patterns

chunk_patterns = """mychunk:{<NN.?>*<VBD.?>*<JJ.?>*?}"""

Step 5 - Parse that chunk patterns using RegexpParser

parsing = RegexpParser(chunk_patterns) print(parsing)

chunk.RegexpParser with 1 stages:
RegexpChunkParser with 1 rules:
       *<VBD.?>*<JJ.?>*?'>

Step 6 - Apply parser on tagging and print the results

Result = parsing.parse(tagging) print("The Final Result Should look like this:", Result)

The Final Result Should look like this: (S
  (mychunk Albert/NNP Einstein/NNP was/VBD)
  a/DT
  (mychunk German-born/JJ theoretical/JJ)
  (mychunk physicist/NN)
  who/WP
  (mychunk developed/VBD)
  the/DT
  (mychunk theory/NN)
  of/IN
  (mychunk relativity,/JJ)
  one/CD
  of/IN
  the/DT
  two/CD
  (mychunk pillars/NNS)
  of/IN
  (mychunk modern/JJ)
  physics./FW
  His/PRP$
  (mychunk work/NN)
  is/VBZ
  also/RB
  known/VBN
  for/IN
  its/PRP$
  (mychunk influence/NN)
  on/IN
  the/DT
  (mychunk philosophy/NN)
  of/IN
  (mychunk science./NN))

As we can see there are many words which are not included in our rule which are not tagged as "my chunk" for e.g known, for, its and many more. The words which are included in our rule are tagged as "my chunk".

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Recommender System Machine Learning Project for Beginners-2
Recommender System Machine Learning Project for Beginners Part 2- Learn how to build a recommender system for market basket analysis using association rule mining.

BERT Text Classification using DistilBERT and ALBERT Models
This Project Explains how to perform Text Classification using ALBERT and DistilBERT

Avocado Machine Learning Project Python for Price Prediction
In this ML Project, you will use the Avocado dataset to build a machine learning model to predict the average price of avocado which is continuous in nature based on region and varieties of avocado.

Llama2 Project for MetaData Generation using FAISS and RAGs
In this LLM Llama2 Project, you will automate metadata generation using Llama2, RAGs, and AWS to reduce manual efforts.

Learn to Build an End-to-End Machine Learning Pipeline - Part 2
In this Machine Learning Project, you will learn how to build an end-to-end machine learning pipeline for predicting truck delays, incorporating Hopsworks' feature store and Weights and Biases for model experimentation.

Time Series Classification Project for Elevator Failure Prediction
In this Time Series Project, you will predict the failure of elevators using IoT sensor data as a time series classification machine learning problem.

NLP Project to Build a Resume Parser in Python using Spacy
Use the popular Spacy NLP python library for OCR and text classification to build a Resume Parser in Python.

Build a Graph Based Recommendation System in Python -Part 1
Python Recommender Systems Project - Learn to build a graph based recommendation system in eCommerce to recommend products.

MLOps AWS Project on Topic Modeling using Gunicorn Flask
In this project we will see the end-to-end machine learning development process to design, build and manage reproducible, testable, and evolvable machine learning models by using AWS

Mastering A/B Testing: A Practical Guide for Production
In this A/B Testing for Machine Learning Project, you will gain hands-on experience in conducting A/B tests, analyzing statistical significance, and understanding the challenges of building a solution for A/B testing in a production environment.