When to use stemming and when to use lemmatization in nltk

This recipe explains when to use stemming and when to use lemmatization in nltk

Recipe Objective

When we are talking about the sentimental analysis, customer review analysis or we want to take out some output from customer reviews and positive and negative sentiments then stemming comes into picture. Whereas lemmatization is used when it comes to chatbots and displaying the reviews of the site, services, or products where the output should be understandable by a human.

Explanation

Stemming It is used to chop the words, or we can say that reduce the size of the words. for e.g. eating, and eat will become eat and beating, and beat will become beat, but in some cases, it will not work for e.g. Words like Finally, Finalized and Final will become Fina which is not understandable by humans because Stemming reduces the size of the word and taking out the common word from the matching one only. So in that case to make it understandable by human lemmatization comes into the picture where it converts the word into a meaningful output which will be understandable by a human. for e.g historical, history will become history and finalized, final and finally will become final

Step 1 - Import the library - nltk and PorterStemmer from nltk

import nltk from nltk.stem import PorterStemmer

As we have imported the nltk library which is nothing but the Natural language Processing toolkit and from nltk.stem we have imported the PorterStemmer for Stemming which is a popularly used Stemmer

Step 2 - Create a Variable for stemmer

My_stemmer = PorterStemmer()

Here we have taken a variable as My_stemmer and stored our PorterStemmer in that variable for further operations

Step 3 - Input words into the stemmer

print("The output after Stemming the word is :", My_stemmer.stem('writing'), '\n') print("The output after Stemming the word is :", My_stemmer.stem('eating'))

The output after Stemming the word is : write

The output after Stemming the word is : eat

from the above we have got an idea about how stemming works as we can see the word writing has become write and eating has become eat

Step 4 - Import the lemmatizer from nltk library

from nltk.stem import WordNetLemmatizer

Now we will check the process with lemmatizer as we did with Stemmer for that we are importing the library WordNetLemmatizer from nltk which is popularly used one.

Step 5 - Create a variable for lemmatizer

My_lemmatizer = WordNetLemmatizer()

Here we have taken a variable My_lemmatizer and stored our WordNetLemmatizer in that variable for further operations

Step 6 - Input words into lemmatizer

print("The word after lemmatization :",My_lemmatizer.lemmatize('eating'), '\n') print("The word after lemmatization :",My_lemmatizer.lemmatize('bottles'))

The word after lemmatization : eating

The word after lemmatization : bottle

From the above, we get the idea about lemmatizer working as the eating word has remained the same because it gives meaningful output that will be understandable by humans also the second word bottles has become bottle as a converted word.

What Users are saying..

profile image

Savvy Sahai

Data Science Intern, Capgemini
linkedin profile url

As a student looking to break into the field of data engineering and data science, one can get really confused as to which path to take. Very few ways to do it are Google, YouTube, etc. I was one of... Read More

Relevant Projects

Ola Bike Rides Request Demand Forecast
Given big data at taxi service (ride-hailing) i.e. OLA, you will learn multi-step time series forecasting and clustering with Mini-Batch K-means Algorithm on geospatial data to predict future ride requests for a particular region at a given time.

Text Classification with Transformers-RoBERTa and XLNet Model
In this machine learning project, you will learn how to load, fine tune and evaluate various transformer models for text classification tasks.

Learn Object Tracking (SOT, MOT) using OpenCV and Python
Get Started with Object Tracking using OpenCV and Python - Learn to implement Multiple Instance Learning Tracker (MIL) algorithm, Generic Object Tracking Using Regression Networks Tracker (GOTURN) algorithm, Kernelized Correlation Filters Tracker (KCF) algorithm, Tracking, Learning, Detection Tracker (TLD) algorithm for single and multiple object tracking from various video clips.

GCP MLOps Project to Deploy ARIMA Model using uWSGI Flask
Build an end-to-end MLOps Pipeline to deploy a Time Series ARIMA Model on GCP using uWSGI and Flask

Build a Logistic Regression Model in Python from Scratch
Regression project to implement logistic regression in python from scratch on streaming app data.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Build a Hybrid Recommender System in Python using LightFM
In this Recommender System project, you will build a hybrid recommender system in Python using LightFM .

Recommender System Machine Learning Project for Beginners-1
Recommender System Machine Learning Project for Beginners - Learn how to design, implement and train a rule-based recommender system in Python

Build Portfolio Optimization Machine Learning Models in R
Machine Learning Project for Financial Risk Modelling and Portfolio Optimization with R- Build a machine learning model in R to develop a strategy for building a portfolio for maximized returns.

Census Income Data Set Project-Predict Adult Census Income
Use the Adult Income dataset to predict whether income exceeds 50K yr based oncensus data.