How to use Porter Stemmer in nltk

This recipe helps you use Porter Stemmer in nltk

Recipe Objective

As we have discussed before what is stemming, So it is nothing but reducing the words or chopping the words into their root forms for e.g eating becomes eat and so on. So in stemming there are different stemmers and we are going to discuss PortersStemmer the most popularly used one.

Porters Stemmer It is a type of stemmer which is mainly known for Data Mining and Information Retrieval. As its applications are limited to the English language only. It is based on the idea that the suffixes in the English language are made up of a combination of smaller and simpler suffixes, it is also majorly known for its simplicity and speed. The advantage is, it produces the best output from other stemmers and has less error rate.

Step 1 - Import the NLTK library and from NLTK import PorterStemmer

import nltk from nltk.stem import PorterStemmer

Step 2 - Creat a variable and store PorterStemmer into it

ps = PorterStemmer()

Step 3 - lets see how to use PorterStemmer

print(ps.stem('bat')) print(ps.stem('batting'))

bat

bat

from the above we can say that the word bat and batting has reduced to bat lets try with some more examples

print(ps.stem('code')) print(ps.stem('coding')) print(ps.stem('coder')) print(ps.stem('coded'))

code

code

coder

code

So, we have observed that it is working for the words like code, coding, coded but not working for coder because if the word has at least one vowel and consonant plus EED ending, change the ending to 'EE' for e.g agreed become agree.

What Users are saying..

profile image

Gautam Vermani

Data Consultant at Confidential
linkedin profile url

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

AWS Project to Build and Deploy LSTM Model with Sagemaker
In this AWS Sagemaker Project, you will learn to build a LSTM model on Sagemaker for sales forecasting while analyzing the impact of weather conditions on Sales.

House Price Prediction Project using Machine Learning in Python
Use the Zillow Zestimate Dataset to build a machine learning model for house price prediction.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Azure Deep Learning-Deploy RNN CNN models for TimeSeries
In this Azure MLOps Project, you will learn to perform docker-based deployment of RNN and CNN Models for Time Series Forecasting on Azure Cloud.

Locality Sensitive Hashing Python Code for Look-Alike Modelling
In this deep learning project, you will find similar images (lookalikes) using deep learning and locality sensitive hashing to find customers who are most likely to click on an ad.

Time Series Classification Project for Elevator Failure Prediction
In this Time Series Project, you will predict the failure of elevators using IoT sensor data as a time series classification machine learning problem.

CycleGAN Implementation for Image-To-Image Translation
In this GAN Deep Learning Project, you will learn how to build an image to image translation model in PyTorch with Cycle GAN.

Learn to Build an End-to-End Machine Learning Pipeline - Part 2
In this Machine Learning Project, you will learn how to build an end-to-end machine learning pipeline for predicting truck delays, incorporating Hopsworks' feature store and Weights and Biases for model experimentation.

Machine Learning Project to Forecast Rossmann Store Sales
In this machine learning project you will work on creating a robust prediction model of Rossmann's daily sales using store, promotion, and competitor data.

Isolation Forest Model and LOF for Anomaly Detection in Python
Credit Card Fraud Detection Project - Build an Isolation Forest Model and Local Outlier Factor (LOF) in Python to identify fraudulent credit card transactions.