How to perform fuzzy logic string matching?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to perform fuzzy logic string matching?

How to perform fuzzy logic string matching?

This recipe helps you perform fuzzy logic string matching

0

Recipe Objective

How to perform fuzzy logic string matching ?

fuzzy logic is the simplest method in case of string matching or we can say comparing the string. The library used in this is called fuzzywuzzy library where we can have a score out of 100 which will denote the two strings are equal by giving similarity index.It is process of finding strings that matches given pattern.Levenshtein distance is used in it for calculation of the difference between the sequences.

Lets understand with practical implementation

Step 1 - Import the necessary libraries

from fuzzywuzzy import fuzz from fuzzywuzzy import process

Step 2 - Lets try Simple ratio usage

print("Lets see the ratio for string matching:",fuzz.ratio('Fuzzy for String Matching', 'Fuzzy String Matching'), '\n') print("Lets see the ratio for string matching:",fuzz.ratio('This is an NLP Session', 'This is an NLP Session'),'\n') print("Lets see the ratio for string matching:",fuzz.ratio('your learning fuzzywuzzy', 'Your Learning FuzzyWuzzy'))
Lets see the ration for string matching: 91 
Lets see the ration for string matching: 100 
Lets see the ration for string matching: 83

From the above we can say that,

the first string match score 91/100 because one of the word is missing in the second string i.e for,

the second string match score is 100/100 because both the strings are same or matching exactly with each other.

the third string match score is 83/100 because in first string all the characters are in lower case but in the second string some of the characters are in upper case and some are in lower case.

Step 3 - Now we will try with partial ratio

print("Lets see the partial ratio for string matching:",fuzz.partial_ratio('Jon is eating', 'Jon is eating !'), '\n') print("Lets see the partial ratio for string matching:",fuzz.partial_ratio('Mark is walking on streets', 'Mark walking streets'),'\n')
Lets see the partial ratio for string matching: 100 
Lets see the partial ratio for string matching: 80 

From the above we understand about partial_ration using FuzzyWuzzy library,

The first Sentence partial_ration score is 100/100 because as there is a Exclamation mark in the second string, but still partially words are same so score comes 100.

The second sentence score is 80/100, score is less because there is a extra token present in the first string.

Step 4 - Token set ratio and token sort ratio

print("Lets see the token sort ratio for string matching:",fuzz.token_sort_ratio("for every one", "every one for"), '\n') print("Lets see the token set ratio for string matching:",fuzz.token_set_ratio("This is done", "This is done done"))
Lets see the token sort ratio for string matching: 100 
Lets see the token set ratio for string matching: 100

Ratio comes 100/100 in both cases because,

token sort ratio This gives 100 as every word is same, irrespective of the position. Position not matters when words are same.

token set ratio it considers duplicate words as a single word.

Step 5 - WRatio with Example

print("Lets see the Wratio for string matching:",fuzz.WRatio("This is good", "This is good"), '\n') print("Lets see the Wratio for string matching:",fuzz.WRatio("Sometimes good is bad!!!", "Sometimes good is bad"))
Lets see the Wratio for string matching: 100 
Lets see the Wratio for string matching: 100

sometimes its better to use WRatio instead of simple ratio as WRatio handles lower and upper cases and some other parameters too.

Relevant Projects

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.