Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
Spark SQL offers the platform to provide a structured data to any dataset regardless its source or form. And once that structured data is formed, it can be queried using tools like Hive, Impala, and other Hadoop data warehouse tools.
In this spark project, we will go through Spark SQL syntax to process the dataset, perform some joins with other supplementary data as well as make the data available for the query using the Spark SQL thrift server. On provision of the data, we will perform some interesting query and other go through some performance tuning technique for Spark SQL.
In this hive project , we will build a Hive data warehouse from a raw dataset stored in HDFS and present the data in a relational structure so that querying the data will be natural.
In this hive project, you will work on denormalizing the JSON data and create HIVE scripts with ORC file format.
Learn to design Hadoop Architecture and understand how to store data using data acquisition tools in Hadoop.