Machine learning for Retail Price Recommendation with Python

Machine learning for Retail Price Recommendation with Python

Use the Mercari Dataset with dynamic pricing to build a price recommendation algorithm using machine learning in Python to automatically suggest the right product prices.
explanation image

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

ipython image

Code & Dataset

Get access to 102+ solved projects with iPython notebooks and datasets.

project experience

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

What will you learn

Performing basic EDA
Advanced EDA ideas
Checking and handling null values
Performing slicing and making function for converting variables into categorical types
Understanding and using TFIDF for analyzing textual data
Understanding and using Count Vectorizer for analyzing textual data
Applying LabelBinarizer for textual data
Building and training a Random forest model
Building and training a SVM model
Building and training a Evaluation model
Building and training a Neural networks model
Drawing predictions from the trained model
Evaluating and fine tuning the model
Building API for the model using flask

Project Description

Business Objective

 

Clothing has strong seasonal pricing trends and is heavily influenced by brand names, while electronics have fluctuating prices based on product specs.

Mercari, Japan’s biggest community-powered shopping app, knows this problem deeply. They’d like to offer pricing suggestions to sellers, but this is tough because their sellers are enabled to put just about anything, or any bundle of things, on Mercari's marketplace.

In this project, Mercari’s challenging us to build an algorithm that automatically suggests the right product prices.

 

 

Data Overview

 

There are two files available. They are train.tsv and test.tsv

Both are tab separated files

The following are the data fields

  1. train_id or test_id - the id of the listing
  2. name - the title of the listing. Note that we have cleaned the data to remove text that looks like prices (e.g. $20) to avoid leakage. These removed prices are represented as [rm]
  3. item_condition_id - the condition of the items provided by the seller
  4. category_name - category of the listing
  5. brand_name
  6. price - the price that the item was sold for. This is the target variable that you will predict. The unit is USD. This column doesn't exist in test.tsv since that is what you will predict.
  7. shipping - 1 if shipping fee is paid by seller and 0 by buyer
  8. item_description - the full description of the item. Note that we have cleaned the data to remove text that look like prices (e.g. $20) to avoid leakage. These removed prices are represented as [rm]

 

 

Source  https://www.kaggle.com/c/mercari-price-suggestion-challenge/overview/description

 

 

Aim

To predict the price of the product using the given description and other information

 

 

Tech Stack

  • Language used : R
  • Packages used : superml, textstem, neuralnet, gbm, quantenda, and so on..
  • UI support : R Shiny Dashboard

 

 

Approach

 

  1. Exploratory Data Analysis

Exploratory data analysis is the process of analysing the dataset to understand its characteristics. In this step, we perform the following.

    1. Univariate analysis - Analysis of a single variable
    2. Bivariate analysis - Analysis of relationship between two variable

 

  1. Data cleaning / Pre-processing (outlier/missing values/categorical)

Machine learning algorithms for regression can understand the input only in the form of numbers and hence it is highly essential to convert the non - numeric data that we have to numeric data by providing them labels.

    1. Label Encoding

 

  1. Missing value treatment

This step involves the process of filling the missing values in appropriate ways so that the data is not lost.

 

  1. Feature Engineering
    1. CountVectorizer
    2. TFIDF for text data

 

  1. Modelling

Various regression algorithms are applied on the dataset and the model that suits best for the dataset is selected. The models that we apply for this dataset are

    1. Random forest
    2. SVM
    3. Evaluation
    4. Neural networks
    5. Evaluation

New Projects

Curriculum For This Mini Project

Price prediction business context
08m
Price prediction introduction
05m
EDA univariate analysis
04m
EDA bivariate analysis
10m
Checking and imputing missing values
06m
Label encoding
05m
Text cleaning
05m
Understanding feature engineering
04m
Implementation of count Vectorizer and TF-IDF
06m
Understanding decision trees
06m
Building a decision tree model
04m
Understanding random forest
06m
Building a random forest model
05m
Understanding gradient boosting machines
03m
Building a gradient boosting machine
06m
Understanding support vector machines
06m
Building a support vector machine
07m
Understanding neural networks
02m
Building a neural network model
07m
Comparison of models and other potential approaches
05m

Latest Blogs