Data Warehouse Design for E-commerce Environments

Data Warehouse Design for E-commerce Environments

In this hive project, you will design a data warehouse for e-commerce environments.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

What will you learn

Roles in a data engineering project and their functions
Understanding the E-commerce business
Designing the roadmap of the complete project
Starting up the virtual environment in Quickstart VM ware
Basic EDA of the dataset
Injecting data or ETL with Apache Sqoop
Why use Parquet type for storing the data
Executing and troubleshooting Sqoop
Creating tables for storing data
Data processing with Spark Scala
Creating objects using Scala
Data querying using Hive/Impala
Scheduling the workflow of Hadoop using Apache Oozie
Mapping function to variables in Scala XML
Using Oozie dryrun for generating job Id to track the desired job
Scheduling complex workflow using Oozie Coordinator
Detecting errors and solving them using MapReduce
Troubleshooting Oozie configuration

Project Description

The entire goal of investing in a data infrastructure is to improve the edge of business as well as the company's bottom line.

In this big data project, we are going to be designing a data warehouse for a retail shop. The design and implementation, however, we focus on answering some specific questions that are related to price optimization and inventory allocation. The two questions we will be looking to answer in this hive project include:

  1. Were the higher priced items selling in certain markets?
  2. should inventory be re-allocated or price optimized based upon geography?

We will recognize the entire purpose of answer these questions with data is to boost overall bottom line for the business while improving the experience for the shoppers.

Similar Projects

In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.

The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

In this project, we will look at two database platforms - MongoDB and Cassandra and look at the philosophical difference in how these databases work and perform analytical queries.

Curriculum For This Mini Project

Importance of Data Engineering
04m
Overview of the E-commerce Business Problem
16m
Solution Design
13m
Data Exploration
19m
Create Views
15m
Migrate Data or ETL with Apache Sqoop
17m
Executing and Troubleshooting a Sqoop Job
12m
Create Views for EDA (Exploratory Data Analysis)
09m
Perform EDA (Exploratory Data Analysis)
07m
Analyse data with Spark
06m
Perform EDA and Troubleshooting
19m
Data Processing with Spark Scala
05m
Scala function to create objects
17m
Building a Map function
04m
Key Value Pairs
03m
Write to HDFS
05m
Troubleshooting Spark script
14m
Business example - Market segmentation
02m
Oozie
05m
Build and troubleshoot an Oozie script
19m
Oozie dryrun
02m
Oozie Coordinator
13m
Troubleshooting Oozie configuration
05m