1-844-696-6465 (US)        +91 77600 44484        help@dezyre.com

Music Recommendation System Project using Python and R

Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.
What are the prerequisites for this project?

What will you learn

  • Working with Music Data with several category
  • EDA using several Visualization techniques
  • Building Automated Recommendation Engine
  • Solve this use case using Python and R
  • Finding Parameter Tuning for better Algorithm

Project Description


The 11th ACM International Conference on Web Search and Data Mining (WSDM 2018) is challenging you to build a better music recommendation system using a donated dataset from KKBOX. WSDM (pronounced "wisdom") is one of the the premier conferences on web inspired research involving search and data mining. They're committed to publishing original, high quality papers and presentations, with an emphasis on practical but principled novel models.

WSDM has challenged us to help solve these problems and build a better music recommendation system. The dataset is from KKBOX, Asia’s leading music streaming service, holding the world’s most comprehensive Asia-Pop music library with over 30 million tracks.

They currently use a collaborative filtering based algorithm with matrix factorization and word embedding in their recommendation system but believe new machine learning techniques could lead to better results.


In this machine learning project, you will be asked to predict the chances of a user listening to a song repetitively after the first observable listening event within a time window was triggered. If there are recurring listening event(s) triggered within a month after the user’s very first observable listening event, its target is marked 1, and 0 otherwise in the training set. The same rule applies to the testing set.

KKBOX provides a music dataset that consists of information of the first observable listening event for each unique user-song pair within a specific time duration. Metadata of each unique user and song pair is also provided. The use of public data to increase the level of accuracy of your prediction is encouraged.

The train and the test data are selected from users listening history in a given time period. Note that this time period is chosen to be before the WSDM-KKBox Churn Prediction time period. The train and test sets are split based on time, and the split of public/private is based on unique user/song pairs.



Data Scientist / Business Consultant at GE

3 years of rich working experience in BIG Data, Business Intelligence & Analytics with CMMI Level 5 Organizations in BFSI, Manufacturing Sector. Excellent written and oral communications, strong analytical and problem solving capabilities. Constantly learning and experimenting emerging open source tools and technologie see more...

What is Hackerday?

Stay updated in technology trends by working on projects

Live online coding sessions led by industry experts

Build 2-4 projects a month each lasting 6 hours designed to teach you advanced concepts

Code in groups and connect with your community