What is cosine similarity and how to calculate it?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

What is cosine similarity and how to calculate it?

What is cosine similarity and how to calculate it?

This recipe explains what is cosine similarity and how to calculate it

Recipe Objective

Cosine similarity gives us the sense of cos angle between vectors. When vector are in same direction, cosine similarity is 1 while in case of perpendicular, it is 0. It is given by (1- cosine distance).

So this recipe is a short example on what cosine similarity is and how to calculate it. Let's get started.

Step 1 - Import the library

from scipy import spatial

Let's pause and look at these imports. We have imported spatial library from scipy class Scipy contains bunch of scientific routies like solving differential equations.

Step 2 - Setup the Data

x=[1,2,3] y=[-1,-2,-3]

Let us create two vectors list.

Step 3 - Calculating cosine similarity

z=1-spatial.distance.cosine(x,y)

We have first calucated cosine distance and the subtracting it from 1 has given us cosine similarity

Step 4 – Printing results

print(z)

Simply use print function to print new appended list.

Step 5 - Let's look at our dataset now

Once we run the above code snippet, we will see:

-1.0

Relevant Projects

Human Activity Recognition Using Multiclass Classification in Python
In this human activity recognition project, we use multiclass classification machine learning techniques to analyse fitness dataset from a smartphone tracker.

Forecasting Business KPI's with Tensorflow and Python
In this machine learning project, you will use the video clip of an IPL match played between CSK and RCB to forecast key performance indicators like the number of appearances of a brand logo, the frames, and the shortest and longest area percentage in the video.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Grouping similar schools/colleges using scorecard and other factors
Use cluster analysis to identify the groups of characteristically similar schools in the College Scorecard dataset. Considerations: Clustering Algorithm Data Preparation How will you deal with missing values? Categorical variables? Feature intercorrelations? Feature normalization or scaling? Dimensionality reduction? Hyperparameters How will you set the parameters -- the algorithm's knobs and dials, so to speak -- in order to achieve valid and useful output? Interpretation Is it possible to explain what each cluster represents? Did you retain or prepare a set of features that enables a meaningful interpretation of the clusters? Do the compositions of the clusters seem to make sense? Validation How will you measure the validity of your clustering process? Which metrics will you use and how will you apply them?

Creating your own embeddings using Glove and Word2vec
We all at some point in time wished to create our own language as a child! But what if certain words always cooccur with another in a corpus? Thus you can make your own model which will understand which word goes with which one, which words are often coming together etc. This all can be done by building a custom embeddings model which we create in this project

Digit Recognition using CNN for MNIST Dataset in Python
In this deep learning project, you will build a convolutional neural network using MNIST dataset for handwritten digit recognition.

Time Series LSTM forecasting
In this project, we will use time-series forecasting to predict the values of a sensor using multiple dependent variables. A variety of machine learning models are applied in this task of time series forecasting. We will see a comparison between the LSTM, ARIMA and Regression models. Classical forecasting methods like ARIMA are still popular and powerful but they lack the overall generalizability that memory-based models like LSTM offer. Every model has its own advantages and disadvantages and that will be discussed. The main objective of this article is to lead you through building a working LSTM model and it's different variants such as Vanilla, Stacked, Bidirectional, etc. There will be special focus on customized data preparation for LSTM.

Avocado Machine Learning Project Python for Price Prediction
In this ML Project, you will use the Avocado dataset to build a machine learning model to predict the average price of avocado which is continuous in nature based on region and varieties of avocado.

Ola Bike Rides Request Demand Forecast
Given big data at taxi service (ride-hailing) i.e. OLA, you will learn multi-step time series forecasting and clustering with Mini-Batch K-means Algorithm on geospatial data to predict future ride requests for a particular region at a given time.

Image Segmentation using Mask R-CNN with Tensorflow
In this Deep Learning Project on Image Segmentation Python, you will learn how to implement the Mask R-CNN model for early fire detection.