Data Warehouse Design for E-commerce Environments

Data Warehouse Design for E-commerce Environments

In this hive project, you will design a data warehouse for e-commerce environments.
explanation image


Each project comes with 2-5 hours of micro-videos explaining the solution.

ipython image

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

project experience

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews
profile image

Dhiraj Tandon linkedin profile url

Solution Architect-Cyber Security at ColorTokens

My Interaction was very short but left a positive impression. I enrolled and asked for a refund since I could not find the time. What happened next: They initiated Refund immediately. Their... Read More

profile image

Arvind Sodhi linkedin profile url

VP - Data Architect, CDO at Deutsche Bank

I have extensive experience in data management and data processing. Over the past few years I saw the data management technology transition into the Big Data ecosystem and I needed to follow suit. I... Read More

What will you learn

Roles in a data engineering project and their functions
Understanding the E-commerce business
Designing the roadmap of the complete project
Starting up the virtual environment in Quickstart VM ware
Basic EDA of the dataset
Injecting data or ETL with Apache Sqoop
Why use Parquet type for storing the data
Executing and troubleshooting Sqoop
Creating tables for storing data
Data processing with Spark Scala
Creating objects using Scala
Data querying using Hive/Impala
Scheduling the workflow of Hadoop using Apache Oozie
Mapping function to variables in Scala XML
Using Oozie dryrun for generating job Id to track the desired job
Scheduling complex workflow using Oozie Coordinator
Detecting errors and solving them using MapReduce
Troubleshooting Oozie configuration

Project Description

The entire goal of investing in a data infrastructure is to improve the edge of business as well as the company's bottom line.

In this big data project, we are going to be designing a data warehouse for a retail shop. The design and implementation, however, we focus on answering some specific questions that are related to price optimization and inventory allocation. The two questions we will be looking to answer in this hive project include:

  1. Were the higher priced items selling in certain markets?
  2. should inventory be re-allocated or price optimized based upon geography?

We will recognize the entire purpose of answer these questions with data is to boost overall bottom line for the business while improving the experience for the shoppers.

Similar Projects

In this big data project, we will be performing an OLAP cube design using AdventureWorks database. The deliverable for this session will be to design a cube, build and implement it using Kylin, query the cube and even connect familiar tools (like Excel) with our new cube.

Explore hive usage efficiently in this hadoop hive project using various file formats such as JSON, CSV, ORC, AVRO and compare their relative performances

This Elasticsearch example deploys the AWS ELK stack to analyse streaming event data. Tools used include Nifi, PySpark, Elasticsearch, Logstash and Kibana for visualisation.

Curriculum For This Mini Project

Importance of Data Engineering
Overview of the E-commerce Business Problem
Solution Design
Data Exploration
Create Views
Migrate Data or ETL with Apache Sqoop
Executing and Troubleshooting a Sqoop Job
Create Views for EDA (Exploratory Data Analysis)
Perform EDA (Exploratory Data Analysis)
Analyse data with Spark
Perform EDA and Troubleshooting
Data Processing with Spark Scala
Scala function to create objects
Building a Map function
Key Value Pairs
Write to HDFS
Troubleshooting Spark script
Business example - Market segmentation
Build and troubleshoot an Oozie script
Oozie dryrun
Oozie Coordinator
Troubleshooting Oozie configuration