Movielens dataset analysis for movie recommendations using Spark in Azure

Movielens dataset analysis for movie recommendations using Spark in Azure

In this Databricks Azure tutorial project, you will use Spark Sql to analyse the movielens dataset to provide movie recommendations. As part of this you will deploy Azure data factory, data pipelines and visualise the analysis.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Camille St. Omer

Artificial Intelligence Researcher, Quora 'Most Viewed Writer in 'Data Mining'

I came to the platform with no experience and now I am knowledgeable in Machine Learning with Python. No easy thing I must say, the sessions are challenging and go to the depths. I looked at graduate... Read More

Mohamed Yusef Ahmed

Software Developer at Taske

Recently I became interested in Hadoop as I think its a great platform for storing and analyzing large structured and unstructured data sets. The experts did a great job not only explaining the... Read More

What will you learn

Understanding the problem statement & Microsoft Azure Platform
Developing end to end data pipeline using Microsoft Azure and Databricks Spark
Getting Microsoft Azure Subscription
Creating Resource Group
Introduction to Storage Account
Upload raw data to cloud
Introduction to Azure Data Factory
Create and run ADF pipelines
Introduction to Azure Databricks
Spinning up Databricks cluster
Read data from storage account
Writing Spark Sql on Databricks
Data analysis using spark on Databricks
Data cleansing using spark on Databricks
Data Transformation using spark
Movie Recommendation algorithm using Spark in Azure
Model deployment creating FlaskAPI

Project Description

A movie recommendation system is used by top streaming services like Netflix, Amazon Prime, Hulu, Hotstar etc to recommend movies to their users based on historical viewing patterns.

Before the final recommendation is made, there is a complex data pipeline that brings data from many sources to the recommendation engine. In this project, we use Databricks Spark on Azure with Spark Sql to build this data pipeline.

Our dataset is from GroupLens Research, which is a research group in the Department of Computer Science and Engineering at the University of Minnesota. They operate a movie recommender based on collaborative filtering called MovieLens. This dataset (ml-latest) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 22884377 ratings and 586994 tag applications across 34208 movies. These data were created by 247753 users between January 09, 1995 and January 29, 2016. This dataset was generated on January 29, 2016.

Similar Projects

In this Databricks Azure project, you will use Spark & Parquet file formats to analyse the Yelp reviews dataset. As part of this you will deploy Azure data factory, data pipelines and visualise the analysis.

In this big data project, we will talk about Apache Zeppelin. We will write code, write notes, build charts and share all in one single data analytics environment using Hive, Spark and Pig.

In this big data project, we will look at how to mine and make sense of connections in a simple way by building a Spark GraphX Algorithm and a Network Crawler.

Curriculum For This Mini Project

Signing Up To Microsoft Azure Cloud
Create A Resource Group In Azure
Setting Up Azure Storage Account
Uploading Raw Data
Setup Azure Data Factory
Run The Adf Pipeline
Introduction To Azure Databricks
Setting Up A Cluster In Azure Databricks
Authorise Storage Account In Databricks
Reading Data From Databricks
Exploring The Dataset Using Pyspark
Data Transformation And Analysis Using Pyspark
Pyspark Data Analysis - 1
Pyspark Data Analysis - 2