Music Recommendation System Project using Python and R

Music Recommendation System Project using Python and R

Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Mike Vogt

Information Architect at Bank of America

I have had a very positive experience. The platform is very rich in resources, and the expert was thoroughly knowledgeable on the subject matter - real world hands-on experience. I wish I had this... Read More

Shailesh Kurdekar

Solutions Architect at Capital One

I have worked for more than 15 years in Java and J2EE and have recently developed an interest in Big Data technologies and Machine learning due to a big need at my workspace. I was referred here by a... Read More

What will you learn

Understanding the Problem Statement and Importing the Dataset
Performing basic EDA to get Insights into the data
Importing the necessary libraries
Using Info function to check for null values and datatypes
Plotting stacked bar graphs between features to understand the effect on the target variable
Plotting pie chart to understand the contribution of categorical values
Merging different datasets provided using the merge function
Defining a function for checking for columns with null values in the new merged dataset
Defining function for filling NaN values for object and non_object types columns
Learning to calculate memory being consumed by the DataFrame
Changing Datatypes into suitable datatypes using functions
Using groupby function for analyzing combined effect for different columns on the target variable
Using seaborn for plotting histogram while using the groupby function
Creating a source system tab based primary variables for different dataset
Splitting dependent and Independent columns for training the model
Selecting XGBoost for training the model and defining Dmatrix for the XGBoost model
Defining and Understanding different parameters for model initialization
Performing Cross Folds Validation to prevent overfitting
Defining the evaluation metrics and making the final predictions
Saving the final predictions in CSV format

Project Description

Introduction:

The 11th ACM International Conference on Web Search and Data Mining (WSDM 2018) is challenging you to build a better music recommendation system using a donated dataset from KKBOX. WSDM (pronounced "wisdom") is one of the the premier conferences on web inspired research involving search and data mining. They're committed to publishing original, high quality papers and presentations, with an emphasis on practical but principled novel models.

WSDM has challenged us to help solve these problems and build a better music recommendation system. The dataset is from KKBOX, Asia’s leading music streaming service, holding the world’s most comprehensive Asia-Pop music library with over 30 million tracks.

They currently use a collaborative filtering based algorithm with matrix factorization and word embedding in their recommendation system but believe new machine learning techniques could lead to better results.

Data:

In this machine learning project, you will be asked to predict the chances of a user listening to a song repetitively after the first observable listening event within a time window was triggered. If there are recurring listening event(s) triggered within a month after the user’s very first observable listening event, its target is marked 1, and 0 otherwise in the training set. The same rule applies to the testing set.

KKBOX provides a music dataset that consists of information of the first observable listening event for each unique user-song pair within a specific time duration. Metadata of each unique user and song pair is also provided. The use of public data to increase the level of accuracy of your prediction is encouraged.

The train and the test data are selected from users listening history in a given time period. Note that this time period is chosen to be before the WSDM-KKBox Churn Prediction time period. The train and test sets are split based on time, and the split of public/private is based on unique user/song pairs.

Similar Projects

Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Curriculum For This Mini Project

Introduction
05m
Download the Dataset
09m
Required Packages
02m
Exploring the Dataset
06m
Visualization of Data
08m
Merging Datasets
05m
Missing Data
05m
Memory Usage
02m
Visualization of Merged Data
07m
Recap
16m
Build the model
07m
Run the model
01m
Building the model in R
10m