How to find ngrams from text?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to find ngrams from text?

How to find ngrams from text?

This recipe helps you find ngrams from text

0

Recipe Objective

How to find n-grams from text?. first of all lets understand what is Ngram so it means the sequence of N words, for e.g "A mango" is a 2-gram, "the cat is dancing" is 4-gram and many more.

Step 1 - Import the necessary packages

import nltk from nltk.util import ngrams

Step 2 - Define a function for ngrams

def extract_ngrams(data, num): n_grams = ngrams(nltk.word_tokenize(data), num) return [ ' '.join(grams) for grams in n_grams]

Here we have defined a function called extract_ngrams which will generate ngrams from sentences.

Step 3 - Take a sample text

My_text = 'Jack is very good in mathematics but he is not that much good in science'

Step 4 - Print the ngrams of the sentence

print("1-gram of the sample text: ", extract_ngrams(My_text, 1), '\n') print("2-gram of the sample text: ", extract_ngrams(My_text, 2), '\n') print("3-gram of the sample text: ", extract_ngrams(My_text, 3), '\n') print("4-gram of the sample text: ", extract_ngrams(My_text, 4), '\n')
1-gram of the sample text:  ['Jack', 'is', 'very', 'good', 'in', 'mathematics', 'but', 'he', 'is', 'not', 'that', 'much', 'good', 'in', 'science'] 

2-gram of the sample text:  ['Jack is', 'is very', 'very good', 'good in', 'in mathematics', 'mathematics but', 'but he', 'he is', 'is not', 'not that', 'that much', 'much good', 'good in', 'in science'] 

3-gram of the sample text:  ['Jack is very', 'is very good', 'very good in', 'good in mathematics', 'in mathematics but', 'mathematics but he', 'but he is', 'he is not', 'is not that', 'not that much', 'that much good', 'much good in', 'good in science'] 

4-gram of the sample text:  ['Jack is very good', 'is very good in', 'very good in mathematics', 'good in mathematics but', 'in mathematics but he', 'mathematics but he is', 'but he is not', 'he is not that', 'is not that much', 'not that much good', 'that much good in', 'much good in science']

The above code will give the output for the number of ngrams that we have specified their, It can be increased or decreased as per our requirement.

Relevant Projects

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Human Activity Recognition Using Smartphones Data Set
In this deep learning project, you will build a classification system where to precisely identify human fitness activities.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.