How to use levenshtein distance in text similarity?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to use levenshtein distance in text similarity?

How to use levenshtein distance in text similarity?

This recipe helps you use levenshtein distance in text similarity

Recipe Objective

How to use levenshtein distance in text similarity ?

levenshtein distance it is defined as distance in which less number of characters required to insert, delete or replace in a given string for e.g String 1 to transform it to another string which is String 2.

For e.g.

String A = helo

String B = hello

So in the above example we need to insert one missing character in String A which is l and transform it to String B. The Levenshtein distance for this will be 1 because there is only one edit is needed.

Similarly if:

String A = kelo

String B = hello

So in this the levenshtein distance will be 2, because not only insertion of l have to done but we have to substitute the character k by h.

Step 1 - Import the necessary libraries

import enchant

Step 2 - Define Sample strings

string_A = "helo" string_B = "hello"

Step 3 - Print the result for levenshtein Distance

print("The Levenshtein Distance between String_A and String_B is: ",enchant.utils.levenshtein(string_A, string_B))
The Levenshtein Distance between String_A and String_B is:  1

So from the above we can get an idea about how levenshtein distance works, in this example the distance is 1 because there is only one operation is needed.

Step 4 - Some more examples

string_C = "Hello Jc" string_D= "Hello Jack" print(enchant.utils.levenshtein(string_C, string_D))
2
string_E = "My nam i S" string_F = "My name is Sam" print(enchant.utils.levenshtein(string_E, string_F))
4

Relevant Projects

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Medical Image Segmentation Deep Learning Project
In this deep learning project, you will learn to implement Unet++ models for medical image segmentation to detect and classify colorectal polyps.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Machine Learning Project to Forecast Rossmann Store Sales
In this machine learning project you will work on creating a robust prediction model of Rossmann's daily sales using store, promotion, and competitor data.

Inventory Demand Forecasting using Machine Learning in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Locality Sensitive Hashing Python Code for Look-Alike Modelling
In this deep learning project, you will find similar images (lookalikes) using deep learning and locality sensitive hashing to find customers who are most likely to click on an ad.