MACHINE LEARNING RECIPES
DATA CLEANING PYTHON
DATA MUNGING
PANDAS CHEATSHEET
ALL TAGS
# What is jaccard similarity and how to calculate it?

# What is jaccard similarity and how to calculate it?

This recipe explains what is jaccard similarity and how to calculate it

Jaccard similarity can be defined to the size of intersection divided by the size of union of two sets. Hence it lies between values 0 & 1. In lay man's term, it is area of overlap/area of union.

So this recipe is a short example on what jaccard similarity is and how to calculate it. Let's get started.

```
x=['Ram','Shyam','Rohan']
y=['Ram','Rohan','Ganesh']
```

Let us create a two list having two common elements.

```
def jaccard(x,y):
z=set(x).intersection(set(y))
a=float(len(z))/(len(x)+len(y)-len(z))
return a
```

We have used the mathematical property of jacccard function to defined the values to be returned if two list are passed into it as arguments.

```
z=jaccard(x,y)
print(z)
```

First call the jaccard function and store the return value in any random variables. Now simply use print function to print new appended dataframe.

Once we run the above code snippet, we will see:

0.5

For above example, we can observe that the area of intersection will be 2 elements and area of overlap will be 4 elements. So jacarrad similarity is 2/4 i.e. '0.5'.

Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.