Identifying Product Bundles from Sales Data Using R Language

In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

What will you learn

  • Understanding the problem Statement

  • Importing the dataset directly from Amazon AWS

  • What is Clustering

  • Performing basic EDA and Data-preprocessing

  • Plotting Boxplot and checking for identifying outliers

  • Fixing outliers using proper technique

  • Scaling and normalizing the dataset

  • Partition based methods, Hierarchical Methods, Model-based methods of clustering

  • Kmeans, k-median, k-mode agglomerative, divisive expectation-maximization based methods

  • Identifying different libraries used for Clustering

  • Steps involved in K-clustering

  • Applying different methods available for clustering

  • Applying the silhouette score on the clustering result

  • Visualizing the result by plotting graphs

  • Picking the best model for making predictions

  • Calculating probability related to testing data points for different clusters

Project Description

The weekly sales transaction dataset consists of weekly purchased quantities of 800 products over 52 weeks. Normalised values are provided too. The objective of this data science project in R is to find out product bundles that can be put together on sale. Typically Market Basket Analysis was used to identify such bundles, here we are going to compare the relative importance of time series clustering in identifying product bundles.

Similar Projects

Big Data Project Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.
Big Data Project Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.
Big Data Project Predict Wine Preferences of Customers using Wine Dataset
In this machine learning project, you will build predictive models to identify wine preferences of people using physiochemical properties of wines and help restaurants recommend the right quality of wine to a customer.
Big Data Project Taxi Trajectory Prediction-Predict the destination of taxi trips
Given a partial trajectory of a taxi, you will be asked to predict its final destination using the taxi trajectory dataset.

Curriculum For This Mini Project

 
  Introduction
03m
  Installing Libraries
01m
  Understand the Data Set
05m
  Outliers
06m
  Clustering Techniques
02m
  Installing Library to implement KMeans
05m
  Steps in KMeans Algorithm
11m
  Implementing Kmeans
12m
  Cluster Model Deployment
04m
  Hierarchical Clustering - Agglomerative Method
10m
  Hierarchical Clustering - Divisive Method
04m
  Silhoutte Score
04m
  Cluster Goodness using Silhoutte score
05m
  Model Based Clustering
06m
  Self Organizing Maps
07m
  FactoExtra library for Visualization
01m
  Other Clustering Methods
10m
  Hierarchical Clustering
03m
  Hopkins Statistics
03m
  Determine Optimal Number of Clusters
03m
  Clustering Validation Statistics
05m
  Advanced Clustering
04m
  Conclusion
08m