Avocado Machine Learning Project Python for Price Prediction

Avocado Machine Learning Project Python for Price Prediction

In this ML Project, you will use the Avocado dataset to build a machine learning model to predict the average price of avocado which is continuous in nature based on region and varieties of avocado.
explanation image

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

ipython image

Code & Dataset

Get access to 102+ solved projects with iPython notebooks and datasets.

project experience

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

What will you learn

Understanding the problem statement
Importing the necessary libraries and understanding its use
Importing the dataset
Performing basic EDA and checking for the null values
Filling the null values using appropriate methods
Finding median, average and merging the data
Feature engineering with the date
Plotting time-series graphs for visualization
Drawing a heatmap with the numeric values using Seaborn
Finding lag of a time series
Using groupby function for combined analysis of variables
Differentiating a time series
Performing train_test_split to divide the dataset into train and test
Using mean_absolute_percentage_score and mean_absolute_error as evaluation metrics
Using Adaboost Regressor for making predictions
Applying the ARIMA time series model for training and making predictions
Applying Facebook Prophet model for making predictions
Visualizing the result using graphs
Selecting the best model and making the final predictions

Project Description

Business Objective

Hass avocados, a Mexico based company produces a variety of Avocados which are sold in the US. They have been having good success for the past several years and want to expand. For this, they want to build and assess a plausible model to predict the average price of Hass avocado to consider the expansion of different types of Avocado farms that are available for growing in other regions.

 

 

Aim

Forecast the prices of Avocado in the US

 

 

Data

The data comes directly from retailers’ cash registers based on the actual retail sales of Hass avocados.

  • Data represents weekly retail scan data for National retail volume (units) and price from Apr 2015 to Mar 2018.
  • The Average Price (of avocados) in the table reflects a per unit (per avocado) cost, even when multiple units (avocados) are sold in bags.
  • The Product Lookup codes (PLU’s) in the table are only for Hass avocados. Other varieties of avocados (e.g. greenskins) are not included in this table.

Some relevant columns in the dataset:

  1. Date - date of the observation
  2. AveragePrice - average price of a single avocado
  3. Type - conventional / organic
  4. Region - region of the observation
  5. Total Volume - Total number of avocados sold
  6. 4046 - Total number of avocados with PLU 4046 sold
  7. 4225 - Total number of avocados with PLU 4225 sold
  8. 4770 - Total number of avocados with PLU 4770 sold
  9. Total Bags – Total bags sold
  10. Small/Large/XLarge Bags – Total bags sold by size

There are two types of avocados in the dataset as well as several different regions represented. This allows you to do all sorts of analysis for different areas of the United States, specific cities, or just the overall United States on either type of avocado. Our analysis will be focused on the complete dataset.

 

Dataset : https://www.kaggle.com/neuromusic/avocado-prices#avocado.csv

 

Tech Stack

  • Language used : Python
  • Libraries used : statmodels, pmdarima, fbprophet, scikit-learn

 

Approach

  1. Data Preprocessing
    1. Check for missing values
    2. Label Encoding
    3. One hot encoding
  2. Exploratory Data Analysis
    1. Identifying any overarching trend in data over time
    2. Identifying any repetitive, seasonal patterns in the data
  3. Feature Engineering
    1. Creating new columns
  4. Building Forecast models
    1. Linear Regression
    2. Random Forest Regressor
    3. XGB Regressor
    4. Facebook Prophet
    5. ARIMA
    6. SARIMAX
  5. Evaluating Forecast models
    1. R-squared
    2. MAPE
    3.  MAE
    4. Plots comprising the actual values, forecast and confidence intervals.

New Projects

Curriculum For This Mini Project

Business context avocado price prediction
03m
Data overview
06m
Exploratory data analysis
07m
Feature engineering
04m
Train test split
07m
Encoding categorical variables
08m
Regression modeling
07m
Stationarity and autocorrelation intuition
06m
Time series forecasting introduction
04m
Introduction to ARIMA
09m
Building an ARIMA model
04m
Forecasting and evaluating the ARIMA model
03m
Introduction to SARIMA
05m
Introduction to prophet model
05m
Forecasting using prophet model
08m
Visualizing the prophet model
08m
Best model and modular code overview
04m

Latest Blogs