Music Recommendation System Project using Python and R

Music Recommendation System Project using Python and R

Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

What will you learn

Understanding the Problem Statement and Importing the Dataset
Performing basic EDA to get Insights into the data
Importing the necessary libraries
Using Info function to check for null values and datatypes
Plotting stacked bar graphs between features to understand the effect on the target variable
Plotting pie chart to understand the contribution of categorical values
Merging different datasets provided using the merge function
Defining a function for checking for columns with null values in the new merged dataset
Defining function for filling NaN values for object and non_object types columns
Learning to calculate memory being consumed by the DataFrame
Changing Datatypes into suitable datatypes using functions
Using groupby function for analyzing combined effect for different columns on the target variable
Using seaborn for plotting histogram while using the groupby function
Creating a source system tab based primary variables for different dataset
Splitting dependent and Independent columns for training the model
Selecting XGBoost for training the model and defining Dmatrix for the XGBoost model
Defining and Understanding different parameters for model initialization
Performing Cross Folds Validation to prevent overfitting
Defining the evaluation metrics and making the final predictions
Saving the final predictions in CSV format

Project Description

Introduction:

The 11th ACM International Conference on Web Search and Data Mining (WSDM 2018) is challenging you to build a better music recommendation system using a donated dataset from KKBOX. WSDM (pronounced "wisdom") is one of the the premier conferences on web inspired research involving search and data mining. They're committed to publishing original, high quality papers and presentations, with an emphasis on practical but principled novel models.

WSDM has challenged us to help solve these problems and build a better music recommendation system. The dataset is from KKBOX, Asia’s leading music streaming service, holding the world’s most comprehensive Asia-Pop music library with over 30 million tracks.

They currently use a collaborative filtering based algorithm with matrix factorization and word embedding in their recommendation system but believe new machine learning techniques could lead to better results.

Data:

In this machine learning project, you will be asked to predict the chances of a user listening to a song repetitively after the first observable listening event within a time window was triggered. If there are recurring listening event(s) triggered within a month after the user’s very first observable listening event, its target is marked 1, and 0 otherwise in the training set. The same rule applies to the testing set.

KKBOX provides a music dataset that consists of information of the first observable listening event for each unique user-song pair within a specific time duration. Metadata of each unique user and song pair is also provided. The use of public data to increase the level of accuracy of your prediction is encouraged.

The train and the test data are selected from users listening history in a given time period. Note that this time period is chosen to be before the WSDM-KKBox Churn Prediction time period. The train and test sets are split based on time, and the split of public/private is based on unique user/song pairs.

Similar Projects

In this project, we are going to predict how capable each applicant is repaying a loan.

In this machine learning project, we will use hundreds of anonymized features to predict if customers are satisfied or dissatisfied for one of the biggest banks - Santander

Deep Learning Project using Keras Deep Learning Library to predict the effect of Genetic Variants to enable personalized Medicine.

Curriculum For This Mini Project

Introduction
05m
Download the Dataset
09m
Required Packages
02m
Exploring the Dataset
06m
Visualization of Data
08m
Merging Datasets
05m
Missing Data
05m
Memory Usage
02m
Visualization of Merged Data
07m
Recap
16m
Build the model
07m
Run the model
01m
Building the model in R
10m