Music Recommendation System Project using Python and R

Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

What will you learn

  • Understanding the Problem Statement and Importing the Dataset

  • Performing basic EDA to get Insights into the data

  • Importing the necessary libraries

  • Using Info function to check for null values and datatypes

  • Plotting stacked bar graphs between features to understand the effect on the target variable

  • Plotting pie chart to understand the contribution of categorical values

  • Merging different datasets provided using the merge function

  • Defining a function for checking for columns with null values in the new merged dataset

  • Defining function for filling NaN values for object and non_object types columns

  • Learning to calculate memory being consumed by the DataFrame

  • Changing Datatypes into suitable datatypes using functions

  • Using groupby function for analyzing combined effect for different columns on the target variable

  • Using seaborn for plotting histogram while using the groupby function

  • Creating a source system tab based primary variables for different dataset

  • Splitting dependent and Independent columns for training the model

  • Selecting XGBoost for training the model and defining Dmatrix for the XGBoost model

  • Defining and Understanding different parameters for model initialization

  • Performing Cross Folds Validation to prevent overfitting

  • Defining the evaluation metrics and making the final predictions

  • Saving the final predictions in CSV format

Project Description


The 11th ACM International Conference on Web Search and Data Mining (WSDM 2018) is challenging you to build a better music recommendation system using a donated dataset from KKBOX. WSDM (pronounced "wisdom") is one of the the premier conferences on web inspired research involving search and data mining. They're committed to publishing original, high quality papers and presentations, with an emphasis on practical but principled novel models.

WSDM has challenged us to help solve these problems and build a better music recommendation system. The dataset is from KKBOX, Asia’s leading music streaming service, holding the world’s most comprehensive Asia-Pop music library with over 30 million tracks.

They currently use a collaborative filtering based algorithm with matrix factorization and word embedding in their recommendation system but believe new machine learning techniques could lead to better results.


In this machine learning project, you will be asked to predict the chances of a user listening to a song repetitively after the first observable listening event within a time window was triggered. If there are recurring listening event(s) triggered within a month after the user’s very first observable listening event, its target is marked 1, and 0 otherwise in the training set. The same rule applies to the testing set.

KKBOX provides a music dataset that consists of information of the first observable listening event for each unique user-song pair within a specific time duration. Metadata of each unique user and song pair is also provided. The use of public data to increase the level of accuracy of your prediction is encouraged.

The train and the test data are selected from users listening history in a given time period. Note that this time period is chosen to be before the WSDM-KKBox Churn Prediction time period. The train and test sets are split based on time, and the split of public/private is based on unique user/song pairs.

Similar Projects

Big Data Project Build a Customer Churn Prediction Model for Insurance Domain
Machine Learning Project in R -Predict which customers will leave an insurance company in the next 12 months.
Big Data Project Data Science Project-Movie Review Sentiment Analysis using R
Learn to classify the sentiment of sentences from the Rotten Tomatoes dataset. You will be asked to label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive.
Big Data Project German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.
Big Data Project Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Curriculum For This Mini Project

  Download the Dataset
  Required Packages
  Exploring the Dataset
  Visualization of Data
  Merging Datasets
  Missing Data
  Memory Usage
  Visualization of Merged Data
  Build the model
  Run the model
  Building the model in R