Music Recommendation System Project using Python and R

Music Recommendation System Project using Python and R

Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Ray Han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Prasanna Lakshmi T

Advisory System Analyst at IBM

Initially, I was unaware of how this would cater to my career needs. But when I stumbled through the reviews given on the website. I went through many of them and found them all positive. I would... Read More

What will you learn

Understanding the Problem Statement and Importing the Dataset
Performing basic EDA to get Insights into the data
Importing the necessary libraries
Using Info function to check for null values and datatypes
Plotting stacked bar graphs between features to understand the effect on the target variable
Plotting pie chart to understand the contribution of categorical values
Merging different datasets provided using the merge function
Defining a function for checking for columns with null values in the new merged dataset
Defining function for filling NaN values for object and non_object types columns
Learning to calculate memory being consumed by the DataFrame
Changing Datatypes into suitable datatypes using functions
Using groupby function for analyzing combined effect for different columns on the target variable
Using seaborn for plotting histogram while using the groupby function
Creating a source system tab based primary variables for different dataset
Splitting dependent and Independent columns for training the model
Selecting XGBoost for training the model and defining Dmatrix for the XGBoost model
Defining and Understanding different parameters for model initialization
Performing Cross Folds Validation to prevent overfitting
Defining the evaluation metrics and making the final predictions
Saving the final predictions in CSV format

Project Description


The 11th ACM International Conference on Web Search and Data Mining (WSDM 2018) is challenging you to build a better music recommendation system using a donated dataset from KKBOX. WSDM (pronounced "wisdom") is one of the the premier conferences on web inspired research involving search and data mining. They're committed to publishing original, high quality papers and presentations, with an emphasis on practical but principled novel models.

WSDM has challenged us to help solve these problems and build a better music recommendation system. The dataset is from KKBOX, Asia’s leading music streaming service, holding the world’s most comprehensive Asia-Pop music library with over 30 million tracks.

They currently use a collaborative filtering based algorithm with matrix factorization and word embedding in their recommendation system but believe new machine learning techniques could lead to better results.


In this machine learning project, you will be asked to predict the chances of a user listening to a song repetitively after the first observable listening event within a time window was triggered. If there are recurring listening event(s) triggered within a month after the user’s very first observable listening event, its target is marked 1, and 0 otherwise in the training set. The same rule applies to the testing set.

KKBOX provides a music dataset that consists of information of the first observable listening event for each unique user-song pair within a specific time duration. Metadata of each unique user and song pair is also provided. The use of public data to increase the level of accuracy of your prediction is encouraged.

The train and the test data are selected from users listening history in a given time period. Note that this time period is chosen to be before the WSDM-KKBox Churn Prediction time period. The train and test sets are split based on time, and the split of public/private is based on unique user/song pairs.

Similar Projects

Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

In this project, we will use traditional time series forecasting methods as well as modern deep learning methods for time series forecasting.

In this machine learning project, we will implement Back-propagation Algorithm from scratch for classification problems.

Curriculum For This Mini Project

Download the Dataset
Required Packages
Exploring the Dataset
Visualization of Data
Merging Datasets
Missing Data
Memory Usage
Visualization of Merged Data
Build the model
Run the model
Building the model in R