Music Recommendation System Project using Python and R

Music Recommendation System Project using Python and R

Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Shailesh Kurdekar

Solutions Architect at Capital One

I have worked for more than 15 years in Java and J2EE and have recently developed an interest in Big Data technologies and Machine learning due to a big need at my workspace. I was referred here by a... Read More


Lead Consultant, ITC Infotech

The project orientation is very much unique and it helps to understand the real time scenarios most of the industries are dealing with. And there is no limit, one can go through as many projects... Read More

What will you learn

Understanding the Problem Statement and Importing the Dataset
Performing basic EDA to get Insights into the data
Importing the necessary libraries
Using Info function to check for null values and datatypes
Plotting stacked bar graphs between features to understand the effect on the target variable
Plotting pie chart to understand the contribution of categorical values
Merging different datasets provided using the merge function
Defining a function for checking for columns with null values in the new merged dataset
Defining function for filling NaN values for object and non_object types columns
Learning to calculate memory being consumed by the DataFrame
Changing Datatypes into suitable datatypes using functions
Using groupby function for analyzing combined effect for different columns on the target variable
Using seaborn for plotting histogram while using the groupby function
Creating a source system tab based primary variables for different dataset
Splitting dependent and Independent columns for training the model
Selecting XGBoost for training the model and defining Dmatrix for the XGBoost model
Defining and Understanding different parameters for model initialization
Performing Cross Folds Validation to prevent overfitting
Defining the evaluation metrics and making the final predictions
Saving the final predictions in CSV format

Project Description


The 11th ACM International Conference on Web Search and Data Mining (WSDM 2018) is challenging you to build a better music recommendation system using a donated dataset from KKBOX. WSDM (pronounced "wisdom") is one of the the premier conferences on web inspired research involving search and data mining. They're committed to publishing original, high quality papers and presentations, with an emphasis on practical but principled novel models.

WSDM has challenged us to help solve these problems and build a better music recommendation system. The dataset is from KKBOX, Asia’s leading music streaming service, holding the world’s most comprehensive Asia-Pop music library with over 30 million tracks.

They currently use a collaborative filtering based algorithm with matrix factorization and word embedding in their recommendation system but believe new machine learning techniques could lead to better results.


In this machine learning project, you will be asked to predict the chances of a user listening to a song repetitively after the first observable listening event within a time window was triggered. If there are recurring listening event(s) triggered within a month after the user’s very first observable listening event, its target is marked 1, and 0 otherwise in the training set. The same rule applies to the testing set.

KKBOX provides a music dataset that consists of information of the first observable listening event for each unique user-song pair within a specific time duration. Metadata of each unique user and song pair is also provided. The use of public data to increase the level of accuracy of your prediction is encouraged.

The train and the test data are selected from users listening history in a given time period. Note that this time period is chosen to be before the WSDM-KKBox Churn Prediction time period. The train and test sets are split based on time, and the split of public/private is based on unique user/song pairs.

Similar Projects

Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

In this machine learning project , you will predict the total travel time of taxi trips from their initial partial trajectories.

Curriculum For This Mini Project

Download the Dataset
Required Packages
Exploring the Dataset
Visualization of Data
Merging Datasets
Missing Data
Memory Usage
Visualization of Merged Data
Build the model
Run the model
Building the model in R