Each project comes with 2-5 hours of micro-videos explaining the solution.
Code & Dataset
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
Understanding the Problem Statement and Importing the Dataset
Performing basic EDA to get Insights into the data
Importing the necessary libraries
Using Info function to check for null values and datatypes
Plotting stacked bar graphs between features to understand the effect on the target variable
Plotting pie chart to understand the contribution of categorical values
Merging different datasets provided using the merge function
Defining a function for checking for columns with null values in the new merged dataset
Defining function for filling NaN values for object and non_object types columns
Learning to calculate memory being consumed by the DataFrame
Changing Datatypes into suitable datatypes using functions
Using groupby function for analyzing combined effect for different columns on the target variable
Using seaborn for plotting histogram while using the groupby function
Creating a source system tab based primary variables for different dataset
Splitting dependent and Independent columns for training the model
Selecting XGBoost for training the model and defining Dmatrix for the XGBoost model
Defining and Understanding different parameters for model initialization
Performing Cross Folds Validation to prevent overfitting
Defining the evaluation metrics and making the final predictions
Saving the final predictions in CSV format
The 11th ACM International Conference on Web Search and Data Mining (WSDM 2018) is challenging you to build a better music recommendation system using a donated dataset from KKBOX. WSDM (pronounced "wisdom") is one of the the premier conferences on web inspired research involving search and data mining. They're committed to publishing original, high quality papers and presentations, with an emphasis on practical but principled novel models.
WSDM has challenged us to help solve these problems and build a better music recommendation system. The dataset is from KKBOX, Asia’s leading music streaming service, holding the world’s most comprehensive Asia-Pop music library with over 30 million tracks.
They currently use a collaborative filtering based algorithm with matrix factorization and word embedding in their recommendation system but believe new machine learning techniques could lead to better results.
In this machine learning project, you will be asked to predict the chances of a user listening to a song repetitively after the first observable listening event within a time window was triggered. If there are recurring listening event(s) triggered within a month after the user’s very first observable listening event, its target is marked 1, and 0 otherwise in the training set. The same rule applies to the testing set.
KKBOX provides a music dataset that consists of information of the first observable listening event for each unique user-song pair within a specific time duration. Metadata of each unique user and song pair is also provided. The use of public data to increase the level of accuracy of your prediction is encouraged.
The train and the test data are selected from users listening history in a given time period. Note that this time period is chosen to be before the WSDM-KKBox Churn Prediction time period. The train and test sets are split based on time, and the split of public/private is based on unique user/song pairs.