When to use stemming and when to use lemmatization?

When to use stemming and when to use lemmatization?

When to use stemming and when to use lemmatization?

This recipe explains when to use stemming and when to use lemmatization


Recipe Objective

When we are talking about the sentimental analysis, customer review analysis or we want to take out some output from customer reviews and positive and negative sentiments then stemming comes into picture. Whereas lemmatization is used when it comes to chatbots and displaying the reviews of the site, services, or products where the output should be understandable by a human.


Stemming It is used to chop the words, or we can say that reduce the size of the words. for e.g. eating, and eat will become eat and beating, and beat will become beat, but in some cases, it will not work for e.g. Words like Finally, Finalized and Final will become Fina which is not understandable by humans because Stemming reduces the size of the word and taking out the common word from the matching one only. So in that case to make it understandable by human lemmatization comes into the picture where it converts the word into a meaningful output which will be understandable by a human. for e.g historical, history will become history and finalized, final and finally will become final

Step 1 - Import the library - nltk and PorterStemmer from nltk

import nltk from nltk.stem import PorterStemmer

As we have imported the nltk library which is nothing but the Natural language Processing toolkit and from nltk.stem we have imported the PorterStemmer for Stemming which is a popularly used Stemmer

Step 2 - Create a Variable for stemmer

My_stemmer = PorterStemmer()

Here we have taken a variable as My_stemmer and stored our PorterStemmer in that variable for further operations

Step 3 - Input words into the stemmer

print("The output after Stemming the word is :", My_stemmer.stem('writing'), '\n') print("The output after Stemming the word is :", My_stemmer.stem('eating'))

The output after Stemming the word is : write

The output after Stemming the word is : eat

from the above we have got an idea about how stemming works as we can see the word writing has become write and eating has become eat

Step 4 - Import the lemmatizer from nltk library

from nltk.stem import WordNetLemmatizer

Now we will check the process with lemmatizer as we did with Stemmer for that we are importing the library WordNetLemmatizer from nltk which is popularly used one.

Step 5 - Create a variable for lemmatizer

My_lemmatizer = WordNetLemmatizer()

Here we have taken a variable My_lemmatizer and stored our WordNetLemmatizer in that variable for further operations

Step 6 - Input words into lemmatizer

print("The word after lemmatization :",My_lemmatizer.lemmatize('eating'), '\n') print("The word after lemmatization :",My_lemmatizer.lemmatize('bottles'))

The word after lemmatization : eating

The word after lemmatization : bottle

From the above, we get the idea about lemmatizer working as the eating word has remained the same because it gives meaningful output that will be understandable by humans also the second word bottles has become bottle as a converted word.

Relevant Projects

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.