MACHINE LEARNING RECIPES
DATA CLEANING PYTHON
DATA MUNGING
PANDAS CHEATSHEET
ALL TAGS
# What is jaccard similarity and how to calculate it?

# What is jaccard similarity and how to calculate it?

This recipe explains what is jaccard similarity and how to calculate it

Jaccard similarity can be defined to the size of intersection divided by the size of union of two sets. Hence it lies between values 0 & 1. In lay man's term, it is area of overlap/area of union.

So this recipe is a short example on what jaccard similarity is and how to calculate it. Let's get started.

```
x=['Ram','Shyam','Rohan']
y=['Ram','Rohan','Ganesh']
```

Let us create a two list having two common elements.

```
def jaccard(x,y):
z=set(x).intersection(set(y))
a=float(len(z))/(len(x)+len(y)-len(z))
return a
```

We have used the mathematical property of jacccard function to defined the values to be returned if two list are passed into it as arguments.

```
z=jaccard(x,y)
print(z)
```

First call the jaccard function and store the return value in any random variables. Now simply use print function to print new appended dataframe.

Once we run the above code snippet, we will see:

0.5

For above example, we can observe that the area of intersection will be 2 elements and area of overlap will be 4 elements. So jacarrad similarity is 2/4 i.e. '0.5'.

In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

In this ML Project, you will use the Avocado dataset to build a machine learning model to predict the average price of avocado which is continuous in nature based on region and varieties of avocado.

In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

In this deep learning project, you will learn how to build your custom OCR (optical character recognition) from scratch by using Google Tesseract and YOLO to read the text from any images.

In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

In this deep learning project, you will learn to implement Unet++ models for medical image segmentation to detect and classify colorectal polyps.

Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.