Implementing Slow Changing Dimensions in a Data Warehouse using Hive and Spark

Implementing Slow Changing Dimensions in a Data Warehouse using Hive and Spark

Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Ray Han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Hiren Ahir

Microsoft Azure SQL Sever Developer, BI Developer

I'm a Graduate student and came into the job market and found a university degree wasn't sufficient to get a good paying job. I aimed at hottest technology in the market Big Data but the word BigData... Read More

What will you learn

Getting the overview of the project
Understanding DataWarehousing using HIve for DataWarehousing
What is a slow-changing dimension (scd)
Types of slow-changing dimension
What is Parquet and ORC, similarities, differences and their use
Downloading the AdventureWorks Dataset
Transferring the data to Hive using Scoop
Denormalizing the Data for data analysis
Saving as Parquet Data commands and running scoop jobs
Viewing the tables created in Hive using Hue
Understanding the Changing Dimensions in Customers Demographics
What is ELT and ETL, similarities, differences and their use
Data Lake as a Storage Repository for saving structured, semi-structured, and unstructured data
Creating Customer tables with SCD-type2
Transformation for SCD Type-1 on Credit Card Table
Tuning and Configuring Hive for SCD
Implementing SCD 2 & 3 in Hive and Spark

Project Description

One of the broadest use of Hadoop today is building data warehousing platform off a data lake. And in building a data warehouse, the traditions left us by Kimball and Inmon is still very much in play.

Why not every one of the legacy rules should be implemented as as-is in the big data platform, the issue of slow-changing dimensions is still a front-burner.

The slow changing dimension of warehouse dimension that is said to rarely change. However, when they change, there should be a systematic approach to capturing that change. Examples of SCDs are customer and products information.

In this hive project, we will look at the various types of SCDs and learn to implements SCDs in Hive and Spark.

Similar Projects

In this big data project, we'll work through a real-world scenario using the Cortana Intelligence Suite tools, including the Microsoft Azure Portal, PowerShell, and Visual Studio.

In this big data project, we will be performing an OLAP cube design using AdventureWorks database. The deliverable for this session will be to design a cube, build and implement it using Kylin, query the cube and even connect familiar tools (like Excel) with our new cube.

In this project, we will use complex scenarios to make Spark developers better to deal with the issues that come in the real world.

Curriculum For This Mini Project

Project Overview
05m
What is Datawarehousing?
03m
Difference between Parquet and ORC
09m
What is slow changing dimension?
07m
Working with AdventureWorks Dataset to Understand SCD
04m
Copy data using Scoop to hive
02m
Denormalize Data
12m
Example to understand SCD
06m
Running the Scoop Job
10m
Hive Querying to View the Data using Hue
09m
Understanding the Changing Dimensions in Customer Demographics
06m
Understanding Different Types of SCD's
18m
Discussion on ELT vs ETL
05m
Datawarehouse vs Data Lake
21m
Data Lakes from a Data Architecture Perspective
06m
Create Customer Table with SCD-Type 2
08m
Create Customer Demo Table SCD-Type 4 and CreditCard Table with SCD Type 1
03m
Transformations for SCD Type 1 on Credit Card Table
07m
Hive Configurations to set SCD
00m
Transformations for SCD Type 1 Continued
26m
Transformations for SCD Type 4 with example
54m