How to use levenshtein distance in text similarity?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to use levenshtein distance in text similarity?

How to use levenshtein distance in text similarity?

This recipe helps you use levenshtein distance in text similarity

0

Recipe Objective

How to use levenshtein distance in text similarity ?

levenshtein distance it is defined as distance in which less number of characters required to insert, delete or replace in a given string for e.g String 1 to transform it to another string which is String 2.

For e.g.

String A = helo

String B = hello

So in the above example we need to insert one missing character in String A which is l and transform it to String B. The Levenshtein distance for this will be 1 because there is only one edit is needed.

Similarly if:

String A = kelo

String B = hello

So in this the levenshtein distance will be 2, because not only insertion of l have to done but we have to substitute the character k by h.

Step 1 - Import the necessary libraries

import enchant

Step 2 - Define Sample strings

string_A = "helo" string_B = "hello"

Step 3 - Print the result for levenshtein Distance

print("The Levenshtein Distance between String_A and String_B is: ",enchant.utils.levenshtein(string_A, string_B))
The Levenshtein Distance between String_A and String_B is:  1

So from the above we can get an idea about how levenshtein distance works, in this example the distance is 1 because there is only one operation is needed.

Step 4 - Some more examples

string_C = "Hello Jc" string_D= "Hello Jack" print(enchant.utils.levenshtein(string_C, string_D))
2
string_E = "My nam i S" string_F = "My name is Sam" print(enchant.utils.levenshtein(string_E, string_F))
4

Relevant Projects

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Human Activity Recognition Using Smartphones Data Set
In this deep learning project, you will build a classification system where to precisely identify human fitness activities.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.