Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
One of the broadest use of Hadoop today is building data warehousing platform off a data lake. And in building a data warehouse, the traditions left us by Kimball and Inmon is still very much in play.
Why not every one of the legacy rules should be implemented as as-is in the big data platform, the issue of slow-changing dimensions is still a front-burner.
The slow changing dimension of warehouse dimension that is said to rarely change. However, when they change, there should be a systematic approach to capturing that change. Examples of SCDs are customer and products information.
In this hive project, we will look at the various types of SCDs and learn to implements SCDs in Hive and Spark.
In this big data project, we will look at how to mine and make sense of connections in a simple way by building a Spark GraphX Algorithm and a Network Crawler.
In this project, we will look at running various use cases in the analysis of crime data sets using Apache Spark.
Explore hive usage efficiently in this hadoop hive project using various file formats such as JSON, CSV, ORC, AVRO and compare their relative performances