Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
Spark 2 offers a huge but yet backward-compatible break from the Spark 1.x, not only in terms of high-level API but also in performance. And spark the module with the most significant new features is Spark SQL.
In this apache spark project, we will explore a number of this features in practice.
We will discuss using various dataset, the new unified spark API as well as the optimization features that makes Spark SQL the first way to explore in processing structured data.
However, there are times when it is inevitable to resort to Spark Core - RDD in Spark 2. We will explore that as well alongside the newest and cool structured streaming API that enables fault-tolerant stream processing engine built on the Spark SQL engine.
In this big data project, we will talk about Apache Zeppelin. We will write code, write notes, build charts and share all in one single data analytics environment using Hive, Spark and Pig.
In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.
In this Databricks Azure tutorial project, you will use Spark Sql to analyse the movielens dataset to provide movie recommendations. As part of this you will deploy Azure data factory, data pipelines and visualise the analysis.