Build a Music Recommendation Algorithm using KKBox's Dataset

Build a Music Recommendation Algorithm using KKBox's Dataset

Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.
explanation image

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

ipython image

Code & Dataset

Get access to 102+ solved projects with iPython notebooks and datasets.

project experience

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

What will you learn

Understanding the problem statement
Data Visualization
Inference about data
Feature Engineering
Outlier treatment
Imputing missing values by replacing with mode
Imputing missing values by removing them
Imputing missing values by making missing label
Importing the dataset and importing libraries
Train and test split for model validation
Building Logistic Regression model
Building Decision Tree classifier
Building Random Forest Classifier
Building XGBoost model
Making Test predictions using the trained model.
Feature Importance

Project Description

Business Overview

Music is one of the most popular source of entertainment in today's era. Listening to music has become much easier due to the digital era.  Few years ago many of the users used to listen to a particular artist or band, some used to love specific types of music. As the world is getting more and more into technology, users have access to various kinds of music on various platforms. Nowadays, the availability of music and music streaming services has been increasing exponentially. The public is listening to all kinds of music ranging from classical, jazz to pop.

Music streaming applications such as spotify, youtube music, amazon music have features to recommend music to the users based on their listening history and preferences. Such features play a vital role in the business of these streaming services. As the time spent on the platform is directly linked to the growth of the streaming services, appropriate recommendations are very important. The music recommendation system by which music provider can predict and suggest appropriate songs based on the characteristic of the music which has been heard by the user over the period of time

Due to the increasing number of songs, artists and kinds of music, it has become difficult to suggest appropriate songs to the user. The challenge of a music recommendation system is to build a system which can understand the users preferences and offer the songs.

In this project we use the KKBOX dataset to build a music recommendation system. This project will walk through some Machine learning techniques that can be applied to recommend songs to users based on their listening patterns.

 

 

Aim

To predict the chance of a user listening to a song repetitively after the first observable listening event within a particular time. 

 

 

Data Description

 

The dataset used is from Asia’s leading music streaming service, KKBOX. It holds the world’s most comprehensive Asia-Pop music library with over 30 million tracks. In the training data set, information of the first observable listening event for each unique user-song pair within a specific time duration is available. Metadata of each unique user and song pair is also provided. There are three datasets available.

 

  1. train.csv: It contains data for different users with attributes such as msno which is used id, song_id, source_system_tab etc. There are about 7.3 million entries available with 30755 unique user ids.
  2. songs.csv: It contains the data related to songs with attributes such as song_id, song_length, genre_ids, artist_name etc. The dataset contains about 2.2 million unique song ids.
  3. members.csv: The data is related to users' information over 34403 different users.

 

 

Tech Stack

  • Language : Python
  • Libraries : sklearn, xgboost, pandas, numpy

 

Approach

 

  1. Exploratory data analysis (EDA)
    1. Data visualization
    2. Inference about features
    3. Feature engineering

 

  1. Data cleaning (outlier/missing values/categorical)
    1. Outlier detection and treatment
    2. Imputing missing values
      1. Replacing by mode
      2. Removing null values
      3. Making a new label as missing
    3. Converting labeled or string values by numerical values

 

  1. Model building on training data
    1. Logistic regression
    2. Decision Tree
    3. Random Forest
    4. XGBoost

 

  1. Model validation
    1. Roc_Auc

 

  1. Feature importance and conclusion

New Projects

Curriculum For This Mini Project

Business problem
00m
Dataset understanding
03m
Importing the main dataset
09m
Data visualization source system tab part-1
03m
Data visualization source system tab part-2
04m
Visualization and inference for main data
11m
Data exploration and visualization for songs data
06m
Exploring the members data
08m
Visualizing the members data
05m
Outlier detection
06m
Feature engineering
04m
Outlier treatment age
02m
Imputing missing values method-1
10m
Imputing missing values method-2
03m
Imputing missing values method-3
03m
Implementing logistic regression model and its results
11m
Implementing decision tree model
02m
Model accuracy comparison for decision tree
02m
Implementing random forest model and results
05m
Implementing-Xgboost model and results
06m
Feature importance
04m
Conclusion
04m

Latest Blogs