How to perform fuzzy logic string matching?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to perform fuzzy logic string matching?

How to perform fuzzy logic string matching?

This recipe helps you perform fuzzy logic string matching

0

Recipe Objective

How to perform fuzzy logic string matching ?

fuzzy logic is the simplest method in case of string matching or we can say comparing the string. The library used in this is called fuzzywuzzy library where we can have a score out of 100 which will denote the two strings are equal by giving similarity index.It is process of finding strings that matches given pattern.Levenshtein distance is used in it for calculation of the difference between the sequences.

Lets understand with practical implementation

Step 1 - Import the necessary libraries

from fuzzywuzzy import fuzz from fuzzywuzzy import process

Step 2 - Lets try Simple ratio usage

print("Lets see the ratio for string matching:",fuzz.ratio('Fuzzy for String Matching', 'Fuzzy String Matching'), '\n') print("Lets see the ratio for string matching:",fuzz.ratio('This is an NLP Session', 'This is an NLP Session'),'\n') print("Lets see the ratio for string matching:",fuzz.ratio('your learning fuzzywuzzy', 'Your Learning FuzzyWuzzy'))
Lets see the ration for string matching: 91 
Lets see the ration for string matching: 100 
Lets see the ration for string matching: 83

From the above we can say that,

the first string match score 91/100 because one of the word is missing in the second string i.e for,

the second string match score is 100/100 because both the strings are same or matching exactly with each other.

the third string match score is 83/100 because in first string all the characters are in lower case but in the second string some of the characters are in upper case and some are in lower case.

Step 3 - Now we will try with partial ratio

print("Lets see the partial ratio for string matching:",fuzz.partial_ratio('Jon is eating', 'Jon is eating !'), '\n') print("Lets see the partial ratio for string matching:",fuzz.partial_ratio('Mark is walking on streets', 'Mark walking streets'),'\n')
Lets see the partial ratio for string matching: 100 
Lets see the partial ratio for string matching: 80 

From the above we understand about partial_ration using FuzzyWuzzy library,

The first Sentence partial_ration score is 100/100 because as there is a Exclamation mark in the second string, but still partially words are same so score comes 100.

The second sentence score is 80/100, score is less because there is a extra token present in the first string.

Step 4 - Token set ratio and token sort ratio

print("Lets see the token sort ratio for string matching:",fuzz.token_sort_ratio("for every one", "every one for"), '\n') print("Lets see the token set ratio for string matching:",fuzz.token_set_ratio("This is done", "This is done done"))
Lets see the token sort ratio for string matching: 100 
Lets see the token set ratio for string matching: 100

Ratio comes 100/100 in both cases because,

token sort ratio This gives 100 as every word is same, irrespective of the position. Position not matters when words are same.

token set ratio it considers duplicate words as a single word.

Step 5 - WRatio with Example

print("Lets see the Wratio for string matching:",fuzz.WRatio("This is good", "This is good"), '\n') print("Lets see the Wratio for string matching:",fuzz.WRatio("Sometimes good is bad!!!", "Sometimes good is bad"))
Lets see the Wratio for string matching: 100 
Lets see the Wratio for string matching: 100

sometimes its better to use WRatio instead of simple ratio as WRatio handles lower and upper cases and some other parameters too.

Relevant Projects

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.