How to perform fuzzy logic string matching in nlp

This recipe helps you perform fuzzy logic string matching in nlp

Recipe Objective

How to perform fuzzy logic string matching ?

fuzzy logic is the simplest method in case of string matching or we can say comparing the string. The library used in this is called fuzzywuzzy library where we can have a score out of 100 which will denote the two strings are equal by giving similarity index.It is process of finding strings that matches given pattern.Levenshtein distance is used in it for calculation of the difference between the sequences.

Lets understand with practical implementation

Step 1 - Import the necessary libraries

from fuzzywuzzy import fuzz from fuzzywuzzy import process

Step 2 - Lets try Simple ratio usage

print("Lets see the ratio for string matching:",fuzz.ratio('Fuzzy for String Matching', 'Fuzzy String Matching'), '\n') print("Lets see the ratio for string matching:",fuzz.ratio('This is an NLP Session', 'This is an NLP Session'),'\n') print("Lets see the ratio for string matching:",fuzz.ratio('your learning fuzzywuzzy', 'Your Learning FuzzyWuzzy'))

Lets see the ration for string matching: 91 
Lets see the ration for string matching: 100 
Lets see the ration for string matching: 83

From the above we can say that,

the first string match score 91/100 because one of the word is missing in the second string i.e for,

the second string match score is 100/100 because both the strings are same or matching exactly with each other.

the third string match score is 83/100 because in first string all the characters are in lower case but in the second string some of the characters are in upper case and some are in lower case.

Step 3 - Now we will try with partial ratio

print("Lets see the partial ratio for string matching:",fuzz.partial_ratio('Jon is eating', 'Jon is eating !'), '\n') print("Lets see the partial ratio for string matching:",fuzz.partial_ratio('Mark is walking on streets', 'Mark walking streets'),'\n')

Lets see the partial ratio for string matching: 100 
Lets see the partial ratio for string matching: 80 

From the above we understand about partial_ration using FuzzyWuzzy library,

The first Sentence partial_ration score is 100/100 because as there is a Exclamation mark in the second string, but still partially words are same so score comes 100.

The second sentence score is 80/100, score is less because there is a extra token present in the first string.

Step 4 - Token set ratio and token sort ratio

print("Lets see the token sort ratio for string matching:",fuzz.token_sort_ratio("for every one", "every one for"), '\n') print("Lets see the token set ratio for string matching:",fuzz.token_set_ratio("This is done", "This is done done"))

Lets see the token sort ratio for string matching: 100 
Lets see the token set ratio for string matching: 100

Ratio comes 100/100 in both cases because,

token sort ratio This gives 100 as every word is same, irrespective of the position. Position not matters when words are same.

token set ratio it considers duplicate words as a single word.

Step 5 - WRatio with Example

print("Lets see the Wratio for string matching:",fuzz.WRatio("This is good", "This is good"), '\n') print("Lets see the Wratio for string matching:",fuzz.WRatio("Sometimes good is bad!!!", "Sometimes good is bad"))

Lets see the Wratio for string matching: 100 
Lets see the Wratio for string matching: 100

sometimes its better to use WRatio instead of simple ratio as WRatio handles lower and upper cases and some other parameters too.

What Users are saying..

profile image

Ameeruddin Mohammed

ETL (Abintio) developer at IBM
linkedin profile url

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More

Relevant Projects

Time Series Forecasting Project-Building ARIMA Model in Python
Build a time series ARIMA model in Python to forecast the use of arrival rate density to support staffing decisions at call centres.

Learn to Build an End-to-End Machine Learning Pipeline - Part 1
In this Machine Learning Project, you will learn how to build an end-to-end machine learning pipeline for predicting truck delays, addressing a major challenge in the logistics industry.

Tensorflow Transfer Learning Model for Image Classification
Image Classification Project - Build an Image Classification Model on a Dataset of T-Shirt Images for Binary Classification

A/B Testing Approach for Comparing Performance of ML Models
The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

Loan Eligibility Prediction Project using Machine learning on GCP
Loan Eligibility Prediction Project - Use SQL and Python to build a predictive model on GCP to determine whether an application requesting loan is eligible or not.

OpenCV Project for Beginners to Learn Computer Vision Basics
In this OpenCV project, you will learn computer vision basics and the fundamentals of OpenCV library using Python.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

MLOps Project on GCP using Kubeflow for Model Deployment
MLOps using Kubeflow on GCP - Build and deploy a deep learning model on Google Cloud Platform using Kubeflow pipelines in Python

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Deep Learning Project for Text Detection in Images using Python
CV2 Text Detection Code for Images using Python -Build a CRNN deep learning model to predict the single-line text in a given image.