Spark Project - Learn to Write Spark Applications using Spark 2.0

Spark Project - Learn to Write Spark Applications using Spark 2.0

In this project, we will use complex scenarios to make Spark developers better to deal with the issues that come in the real world.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Dhiraj Tandon

Solution Architect-Cyber Security at ColorTokens

My Interaction was very short but left a positive impression. I enrolled and asked for a refund since I could not find the time. What happened next: They initiated Refund immediately. Their... Read More

Sujit Singh

Data Engineer, SullivanCotter

This has been a motivating experience. This has helped me execute Pig Latin and Hive commands to solve data problems. They take special care in regards to answering any questions and doubts I had... Read More

What will you learn

Pivoting data
Dealing with Structs/Schemas
UDFs and Abstract logic in UDF
Caching and Checkpointing
Clustering, Bucketing, Sorting and Partitioning
Resource Allocation in Spark

Project Description

Spark is the go-to-framework for today's big data processing. Most companies are jumping on the spark wagon. 
However, Spark is notorious for being easy to get started with but being very difficult to master. The mastery of spark is beyond knowledge of its APIs but also knowledge of its internals. Because of this, there are many developers who in the face of production data use case begin to face unknown problems that were not discussed during training.

This Hackerday wishes to pick apart a couple of these tasks or scenarios that are not really discussed during trainings but can burden developers in practice.

We will look at the concept of Spark memory management, cluster resource allocation, clustering, repartitioning, and many more. 

The goal of the Hackerday is to make Spark developers better at their craft and make those just learning spark to quickly appreciate the depths of the framework. The idea is to go beyond simple use cases into complex scenarios or data pipeline to enabled students to get the issues that come with the real world.

All the learning for the sessions will be done on Spark 2.

Similar Projects

In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.

In this project, we are going to talk about insurance forecast by using regression techniques.

In this Hackerday, we will go through the basis of statistics and see how Spark enables us to perform statistical operations like descriptive and inferential statistics over the very large dataset.

Curriculum For This Mini Project