Movielens dataset analysis using Hive for Movie Recommendations

In this hadoop hive project, you will work on Hive and HQL to analyze movie ratings using MovieLens dataset for better movie recommendation.

What will you learn

  • Working with different file formats (.dat, CSV and text)
  • HQL for effective data analysis
  • Serde packages to load data
  • Internal and External tables in Hive
  • Logical queries for efficient scripting

Project Description

GroupLens Research, which is a research group in the Department of Computer Science and Engineering at the University of Minnesota, operates a movie recommender based on collaborative filtering called MovieLens, which is the source of the data.

This dataset (ml-latest) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 22884377 ratings and 586994 tag applications across 34208 movies. These data were created by 247753 users between January 09, 1995 and January 29, 2016. This dataset was generated on January 29, 2016.

Users were selected at random for inclusion. All selected users had rated at least 1 movies. No demographic information is included. Each user is represented by an id, and no other information is provided.

The data are contained in four files, links.csv, movies.csv, ratings.csv and tags.csv.



Senior Hadoop Engineer at Sirius Computer Solutions

Abhishek has a corporate experience for 5 years in the fields of Hadoop R&D, Big Data technologies, Hadoop administration, IBM Netezza Database Administration, Data Warehousing, Data Mining (Netezza, Oracle PL/SQL and Microsoft SQL Server), Development, ETL and Advanced analytics. He has a vast exposures on various pro see more...