Movielens dataset analysis using Hive for Movie Recommendations

In this hadoop hive project, you will work on Hive and HQL to analyze movie ratings using MovieLens dataset for better movie recommendation.

Users who bought this project also bought

What will you learn

  • Working with different file formats (.dat, CSV and text)
  • HQL for effective data analysis
  • Serde packages to load data
  • Internal and External tables in Hive
  • Logical queries for efficient scripting

What will you get

  • Access to recording of the complete project
  • Access to all material related to project like data files, solution files etc.

Project Description

GroupLens Research, which is a research group in the Department of Computer Science and Engineering at the University of Minnesota, operates a movie recommender based on collaborative filtering called MovieLens, which is the source of the data.

This dataset (ml-latest) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 22884377 ratings and 586994 tag applications across 34208 movies. These data were created by 247753 users between January 09, 1995 and January 29, 2016. This dataset was generated on January 29, 2016.

Users were selected at random for inclusion. All selected users had rated at least 1 movies. No demographic information is included. Each user is represented by an id, and no other information is provided.

The data are contained in four files, links.csv, movies.csv, ratings.csv and tags.csv.

Curriculum For This Mini Project

 
  Discussion on EMR Environment
01m
  About the Movielens Dataset
11m
  Key Learnings from the Project
03m
  Starting the EMR Cluster
03m
  What is Hive ?
03m
  How Hive works?
03m
  Hive Datatypes
05m
  Two Types of Join in Hive -Map Side Join and Reduce Side Join
02m
  Sharing EMR Environment Details
02m
  Logging in to the Amazon EMR Server
07m
  Validating the login into the server
00m
  Logging into Hive environment
01m
  Logging into Hue
02m
  Create Hackerday Ratings Database
00m
  HQL Querying
04m
  Internal and External Tables in Hive
08m
  Hive Configurations using SET command
03m
  Move Data to HDFS
06m
  Create Data Structures for 100K Files
06m
  Check if Item , Genre and Other Tables are loaded properly
06m
  Loading the Data using a Different Delimiter
09m
  Session Q&A
09m
  Creating Tables for Analysis
06m
  Using Serde to Load Hive Tables
09m
  Which year has the most number of ratings?
11m
  Which is the top rated movie each year?
31m
  Which movie has been rated 50% more after 5 years?
56m
  Over last 10 years, which genre has seen maximum decline?
06m