Music Recommendation System Project using Python and R

Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.
Each project comes with 2-5 hours of micro-videos explaining the solution.
Code & Dataset
Get access to 50+ solved projects with iPython notebooks and datasets.
Project Experience
Add project experience to your Linkedin/Github profiles.

What will you learn

  • Working with Music Data with several category
  • EDA using several Visualization techniques
  • Building Automated Recommendation Engine
  • Solve this use case using Python and R
  • Finding Parameter Tuning for better Algorithm

Project Description


The 11th ACM International Conference on Web Search and Data Mining (WSDM 2018) is challenging you to build a better music recommendation system using a donated dataset from KKBOX. WSDM (pronounced "wisdom") is one of the the premier conferences on web inspired research involving search and data mining. They're committed to publishing original, high quality papers and presentations, with an emphasis on practical but principled novel models.

WSDM has challenged us to help solve these problems and build a better music recommendation system. The dataset is from KKBOX, Asia’s leading music streaming service, holding the world’s most comprehensive Asia-Pop music library with over 30 million tracks.

They currently use a collaborative filtering based algorithm with matrix factorization and word embedding in their recommendation system but believe new machine learning techniques could lead to better results.


In this machine learning project, you will be asked to predict the chances of a user listening to a song repetitively after the first observable listening event within a time window was triggered. If there are recurring listening event(s) triggered within a month after the user’s very first observable listening event, its target is marked 1, and 0 otherwise in the training set. The same rule applies to the testing set.

KKBOX provides a music dataset that consists of information of the first observable listening event for each unique user-song pair within a specific time duration. Metadata of each unique user and song pair is also provided. The use of public data to increase the level of accuracy of your prediction is encouraged.

The train and the test data are selected from users listening history in a given time period. Note that this time period is chosen to be before the WSDM-KKBox Churn Prediction time period. The train and test sets are split based on time, and the split of public/private is based on unique user/song pairs.

Curriculum For This Mini Project

  Download the Dataset
  Required Packages
  Exploring the Dataset
  Visualization of Data
  Merging Datasets
  Missing Data
  Memory Usage
  Visualization of Merged Data
  Build the model
  Run the model
  Building the model in R