How to perform chunking on a paragraph in nlp

This recipe helps you perform chunking on a paragraph in nlp
Last Updated: 17 Feb 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

How to perform chunking on a paragraph? Chunking It follows the part of speech (POS) tagging to add more structure to the sentence which is also known as shallow parsing. The resulted words or groups of words are called chunks. These chunks are made up of words, One can even define a pattern or words that can't be a part of the chuck, and such words are known as chinks. The primary work of chunking is to make a group of "noun phrases". The part of speech is combined with regular expression. For e.g if we want to tag Noun, verb, adjective, and coordinating junction from the sentence then we can use the following: chunk : {<NN.?>*<VBD.?>*<JJ.?>*?} We can combine them according to needs and requirements as there are no predefined rules.

Step 1 - Import the necessary libraries

from nltk import pos_tag from nltk import RegexpParser

Step 2 - Take a sample text and split it

Sample_text = '''Albert Einstein was a German-born theoretical physicist who developed the theory of relativity, one of the two pillars of modern physics. His work is also known for its influence on the philosophy of science.''' Sample_text = Sample_text.split() print(Sample_text)

['Albert', 'Einstein', 'was', 'a', 'German-born', 'theoretical', 'physicist', 'who', 'developed', 'the', 'theory', 'of', 'relativity,', 'one', 'of', 'the', 'two', 'pillars', 'of', 'modern', 'physics.', 'His', 'work', 'is', 'also', 'known', 'for', 'its', 'influence', 'on', 'the', 'philosophy', 'of', 'science.']

Step 3 - Apply POS tagging

tagging = pos_tag(Sample_text) print(tagging)

[('Albert', 'NNP'), ('Einstein', 'NNP'), ('was', 'VBD'), ('a', 'DT'), ('German-born', 'JJ'), ('theoretical', 'JJ'), ('physicist', 'NN'), ('who', 'WP'), ('developed', 'VBD'), ('the', 'DT'), ('theory', 'NN'), ('of', 'IN'), ('relativity,', 'JJ'), ('one', 'CD'), ('of', 'IN'), ('the', 'DT'), ('two', 'CD'), ('pillars', 'NNS'), ('of', 'IN'), ('modern', 'JJ'), ('physics.', 'FW'), ('His', 'PRP), ('work', 'NN'), ('is', 'VBZ'), ('also', 'RB'), ('known', 'VBN'), ('for', 'IN'), ('its', 'PRP), ('influence', 'NN'), ('on', 'IN'), ('the', 'DT'), ('philosophy', 'NN'), ('of', 'IN'), ('science.', 'NN')]

Step 4 - Define the chunk patterns

chunk_patterns = """mychunk:{<NN.?>*<VBD.?>*<JJ.?>*?}"""

Step 5 - Parse that chunk patterns using RegexpParser

parsing = RegexpParser(chunk_patterns) print(parsing)

chunk.RegexpParser with 1 stages:
RegexpChunkParser with 1 rules:
       *<VBD.?>*<JJ.?>*?'>

Step 6 - Apply parser on tagging and print the results

Result = parsing.parse(tagging) print("The Final Result Should look like this:", Result)

The Final Result Should look like this: (S
  (mychunk Albert/NNP Einstein/NNP was/VBD)
  a/DT
  (mychunk German-born/JJ theoretical/JJ)
  (mychunk physicist/NN)
  who/WP
  (mychunk developed/VBD)
  the/DT
  (mychunk theory/NN)
  of/IN
  (mychunk relativity,/JJ)
  one/CD
  of/IN
  the/DT
  two/CD
  (mychunk pillars/NNS)
  of/IN
  (mychunk modern/JJ)
  physics./FW
  His/PRP$
  (mychunk work/NN)
  is/VBZ
  also/RB
  known/VBN
  for/IN
  its/PRP$
  (mychunk influence/NN)
  on/IN
  the/DT
  (mychunk philosophy/NN)
  of/IN
  (mychunk science./NN))

As we can see there are many words which are not included in our rule which are not tagged as "my chunk" for e.g known, for, its and many more. The words which are included in our rule are tagged as "my chunk".

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Learn to Build an End-to-End Machine Learning Pipeline - Part 2

In this Machine Learning Project, you will learn how to build an end-to-end machine learning pipeline for predicting truck delays, incorporating Hopsworks' feature store and Weights and Biases for model experimentation.

View Project Details

Walmart Sales Forecasting Data Science Project

Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

View Project Details

MLOps Project to Build Search Relevancy Algorithm with SBERT

In this MLOps SBERT project you will learn to build and deploy an accurate and scalable search algorithm on AWS using SBERT and ANNOY to enhance search relevancy in news articles.

View Project Details

Build Real Estate Price Prediction Model with NLP and FastAPI

In this Real Estate Price Prediction Project, you will learn to build a real estate price prediction machine learning model and deploy it on Heroku using FastAPI Framework.

View Project Details

Mastering A/B Testing: A Practical Guide for Production

In this A/B Testing for Machine Learning Project, you will gain hands-on experience in conducting A/B tests, analyzing statistical significance, and understanding the challenges of building a solution for A/B testing in a production environment.

View Project Details

Abstractive Text Summarization using Transformers-BART Model

Deep Learning Project to implement an Abstractive Text Summarizer using Google's Transformers-BART Model to generate news article headlines.

View Project Details

Create Your First Chatbot with RASA NLU Model and Python

Learn the basic aspects of chatbot development and open source conversational AI RASA to create a simple AI powered chatbot on your own.

View Project Details

Build a Review Classification Model using Gated Recurrent Unit

In this Machine Learning project, you will build a classification model in python to classify the reviews of an app on a scale of 1 to 5 using Gated Recurrent Unit.

View Project Details

PyTorch Project to Build a GAN Model on MNIST Dataset

In this deep learning project, you will learn how to build a GAN Model on MNIST Dataset for generating new images of handwritten digits.

View Project Details

Langchain Project for Customer Support App in Python

In this LLM Project, you will learn how to enhance customer support interactions through Large Language Models (LLMs), enabling intelligent, context-aware responses. This Langchain project aims to seamlessly integrate LLM technology with databases, PDF knowledge bases, and audio processing agents to create a comprehensive customer support application.

View Project Details

How to perform chunking on a paragraph in nlp

Recipe Objective

Step 1 - Import the necessary libraries

Step 2 - Take a sample text and split it

Step 3 - Apply POS tagging

Step 4 - Define the chunk patterns

Step 5 - Parse that chunk patterns using RegexpParser

Step 6 - Apply parser on tagging and print the results

Ray han

Relevant Projects

You might also like

Relevant Projects