How to find ngrams from text?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to find ngrams from text?

How to find ngrams from text?

This recipe helps you find ngrams from text

0

Recipe Objective

How to find n-grams from text?. first of all lets understand what is Ngram so it means the sequence of N words, for e.g "A mango" is a 2-gram, "the cat is dancing" is 4-gram and many more.

Step 1 - Import the necessary packages

import nltk from nltk.util import ngrams

Step 2 - Define a function for ngrams

def extract_ngrams(data, num): n_grams = ngrams(nltk.word_tokenize(data), num) return [ ' '.join(grams) for grams in n_grams]

Here we have defined a function called extract_ngrams which will generate ngrams from sentences.

Step 3 - Take a sample text

My_text = 'Jack is very good in mathematics but he is not that much good in science'

Step 4 - Print the ngrams of the sentence

print("1-gram of the sample text: ", extract_ngrams(My_text, 1), '\n') print("2-gram of the sample text: ", extract_ngrams(My_text, 2), '\n') print("3-gram of the sample text: ", extract_ngrams(My_text, 3), '\n') print("4-gram of the sample text: ", extract_ngrams(My_text, 4), '\n')
1-gram of the sample text:  ['Jack', 'is', 'very', 'good', 'in', 'mathematics', 'but', 'he', 'is', 'not', 'that', 'much', 'good', 'in', 'science'] 

2-gram of the sample text:  ['Jack is', 'is very', 'very good', 'good in', 'in mathematics', 'mathematics but', 'but he', 'he is', 'is not', 'not that', 'that much', 'much good', 'good in', 'in science'] 

3-gram of the sample text:  ['Jack is very', 'is very good', 'very good in', 'good in mathematics', 'in mathematics but', 'mathematics but he', 'but he is', 'he is not', 'is not that', 'not that much', 'that much good', 'much good in', 'good in science'] 

4-gram of the sample text:  ['Jack is very good', 'is very good in', 'very good in mathematics', 'good in mathematics but', 'in mathematics but he', 'mathematics but he is', 'but he is not', 'he is not that', 'is not that much', 'not that much good', 'that much good in', 'much good in science']

The above code will give the output for the number of ngrams that we have specified their, It can be increased or decreased as per our requirement.

Relevant Projects

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.