Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
Spark has a benefit of being very extensible to quite a number of storage platforms beyond Hadoop. This means that as spark developers, we can write and read from virtually any popular storage platform while building our data pipeline.
In this Hackerday, we will look at two such database platforms - MongoDB and Cassandra. These are two different databases or classes and have their use suited for different use cases. We will discuss these and install both platforms in our lab environment, look at the philosophical difference in how these databases work, create sample tables and finally integrate our spark application to load the UK MOT vehicle testing dataset into them. Once loaded, anyone can at any time, perform analytical queries on the tables.
Explore hive usage efficiently in this hadoop hive project using various file formats such as JSON, CSV, ORC, AVRO and compare their relative performances
The goal of this spark project for students is to explore the features of Spark SQL in practice on the latest version of Spark i.e. Spark 2.0.
Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.