Jaccard similarity can be defined to the size of intersection divided by the size of union of two sets. Hence it lies between values 0 & 1. In lay man's term, it is area of overlap/area of union.
So this recipe is a short example on what jaccard similarity is and how to calculate it. Let's get started.
Let us create a two list having two common elements.
def jaccard(x,y): z=set(x).intersection(set(y)) a=float(len(z))/(len(x)+len(y)-len(z)) return a
We have used the mathematical property of jacccard function to defined the values to be returned if two list are passed into it as arguments.
First call the jaccard function and store the return value in any random variables. Now simply use print function to print new appended dataframe.
Once we run the above code snippet, we will see:
For above example, we can observe that the area of intersection will be 2 elements and area of overlap will be 4 elements. So jacarrad similarity is 2/4 i.e. '0.5'.