SciPy Cosine Similarity - Formula, Calculation & Implementation

This code example will help you understand SciPy Cosine Similarity - its formula, calculation methods, & step-by-step implementation strategies. | ProjectPro
Last Updated: 12 Apr 2024

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Cosine similarity addresses many challenges encountered in data science projects when dealing with high-dimensional data, capturing semantic similarity, and scalability, making it a must know concept for every data scientist.

Understanding the similarity between objects or data points lies at the core of various analytical tasks across diverse domains, from document analysis to recommendation systems. One of the fundamental metrics for measuring similarity is cosine similarity, which is particularly prominent in fields like natural language processing, information retrieval, and recommendation systems to measure the similarity between two vectors. Users can efficiently compute the cosine similarity between two vectors or even between two sets of vectors. This short guide will help you understand the concept of Cosine similarity, including its formula and calculation, with the help of examples. So, let’s dive in!

What is SciPy Cosine Similarity?
SciPy Cosine Similarity Formula
How to Calculate Cosine Similarity? - Step-by-Step Guide
How to Implement Cosine Similarity Between Two Vectors in Python?
Cosine Distance vs. Cosine Similarity: The Difference
Advance your Python Skills with ProjectPro!

What is SciPy Cosine Similarity?

Cosine similarity is a metric used to measure the similarity between two vectors, irrespective of their magnitude. It calculates the cosine of the angle between the vectors, which reflects how similar they are in the direction. This measure is widely used in information retrieval, text mining, and recommendation systems.

SciPy Cosine Similarity Formula

The formula for computing cosine similarity involves the dot product of the two vectors divided by the product of their Euclidean norms. Mathematically:

Similarity=A⋅B/∣∣A∣∣⋅∣∣B∣∣

Where A and B are vectors, A.B denotes the dot product, and ||A|| and ||B|| represent the Euclidean norms of A and B respectively.

How to Calculate Cosine Similarity? - Step-by-Step Guide

This step-by-step guide provides a comprehensive walkthrough on calculating cosine similarity, a widely used measure in text mining and information retrieval that facilitates easy comparison between documents or vectors.

Step 1 - Import the library

from scipy import spatial

Let's pause and look at these imports. We have imported a spatial library from the SciPy class. Scipy contains several scientific routines, such as solving differential equations.

Step 2 - Setup the Data

x=[1,2,3]

y=[-1,-2,-3]

Let us create two vector lists.

Step 3 - Calculating Cosine Similarity

z=1-spatial.distance.cosine(x,y)

We first calculated cosine distance and subtracted it from 1, giving us cosine similarity.

Step 4 – Printing Results

print(z)

Simply use print function to print a new appended list.

Step 5 - Let's look at our dataset now

Once we run the above code snippet, we will see:

-1.0

How to Implement Cosine Similarity Between Two Vectors in Python?

The cosine similarity between two vectors in Python can be implemented efficiently using NumPy. First, ensure both vectors are represented as arrays. Then, calculate the dot product of the two vectors using NumPy's dot() function. Next, compute the magnitudes of each vector using numpy.linalg.norm(). Finally, divide the dot product by the product of the magnitudes to obtain the cosine similarity. Check out the example below -

Cosine Similarity Example

Check out the Python example below demonstrating how to use SciPy to calculate the cosine similarity between two vectors -

Python Cosine Similarity

This indicates a high similarity between the two vectors. Remember that cosine similarity values range from -1 to 1, where 1 indicates identical vectors, 0 indicates orthogonal (unrelated) vectors, and -1 indicates exactly opposite vectors.

Cosine Distance vs. Cosine Similarity: The Difference

Cosine distance and cosine similarity are fundamental concepts in natural language processing (NLP) and are crucial for tasks like semantic similarity measurement and clustering. Cosine similarity quantifies the similarity between two vectors by calculating the cosine of the angle between them, ranging from -1 to 1. A value of -1 indicates absolute dissimilarity, 0 suggests no correlation, and 1 signifies perfect similarity. On the other hand, cosine distance is derived from cosine similarity and measures the dissimilarity between vectors, ranging from 0 to 2. It complements cosine similarity by emphasizing differences rather than similarities. The choice between cosine similarity and cosine distance in practical applications depends on the task. Normalization techniques can affect cosine similarity calculations; for instance, Z-score normalization alters the results by changing the mean and standard deviation.

Advance your Python Skills with ProjectPro!

We've seen how cosine similarity is a robust measure for quantifying the similarity between vectors, making it invaluable for tasks like document comparison, content recommendation, and clustering of similar items. Calculating the cosine similarity between vectors representing documents can help you efficiently identify similarities in their content, aiding in tasks such as plagiarism detection or document clustering. Furthermore, real-world examples, such as comparing textual data and user-item interactions in recommendation systems, showcase the practical utility of cosine similarity in various domains. Hands-on practice with real-world Python projects is crucial for mastering its implementation and gaining valuable insights into data analysis and machine learning. ProjectPro is your go-to resource during your learning journey, offering guided projects to cover topics such as SciPy's cosine similarity function comprehensively. With ProjectPro, you can delve into practical applications, gaining hands-on experience and mastering concepts with real-world projects. So, check out ProjectPro Repository to solidify your understanding, enhance your skills, and confidently apply cosine similarity in your data science projects.

What Users are saying..

Savvy Sahai

Data Science Intern, Capgemini

As a student looking to break into the field of data engineering and data science, one can get really confused as to which path to take. Very few ways to do it are Google, YouTube, etc. I was one of... Read More