Explore features of Spark SQL in practice on Spark 2.0

Explore features of Spark SQL in practice on Spark 2.0

The goal of this spark project for students is to explore the features of Spark SQL in practice on the latest version of Spark i.e. Spark 2.0.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Dhiraj Tandon

Solution Architect-Cyber Security at ColorTokens

My Interaction was very short but left a positive impression. I enrolled and asked for a refund since I could not find the time. What happened next: They initiated Refund immediately. Their... Read More

Shailesh Kurdekar

Solutions Architect at Capital One

I have worked for more than 15 years in Java and J2EE and have recently developed an interest in Big Data technologies and Machine learning due to a big need at my workspace. I was referred here by a... Read More

What will you learn

Understanding the roadmap of the project
Downloading and installing Spark on Cloudera VM ware
What is difference between Spark and Spark shell
Why you should think Spark SQL before Spark Core
How Directed Acycylic graphs work undercover in RDD
Installing Java development KIt 8
Configuring Spark 2.0 for using Clusters services
When to use Spark Core
Copying the dataset to Hadoop database
Spark SQL and multiple file types: Text File, JSON File, RDBMS Sources, NoSQL Sources
How to read a JSON and a CSV format file
Reading from a Hive table, JDBC and a parquet file
Understanding the usage of Typed and Untyped columns
Spark SQL for SQL-on-Hadoop server
Using Spark SQL as JDBC server and using Spark for Structured data processing

Project Description

Spark 2 offers a huge but yet backward-compatible break from the Spark 1.x, not only in terms of high-level API but also in performance. And spark the module with the most significant new features is Spark SQL.

In this apache spark project, we will explore a number of this features in practice.

We will discuss using various dataset, the new unified spark API as well as the optimization features that makes Spark SQL the first way to explore in processing structured data.

However, there are times when it is inevitable to resort to Spark Core - RDD in Spark 2. We will explore that as well alongside the newest and cool structured streaming API that enables fault-tolerant stream processing engine built on the Spark SQL engine.

Similar Projects

Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

In this big data project, we will continue from a previous hive project "Data engineering on Yelp Datasets using Hadoop tools" and do the entire data processing using spark.

This Elasticsearch example deploys the AWS ELK stack to analyse streaming event data. Tools used include Nifi, PySpark, Elasticsearch, Logstash and Kibana for visualisation.

Curriculum For This Mini Project

Project Overview
01m
Manual Installation of Spark 2 on Cloudera Quickstart VM
05m
Introduction to Spark
10m
Difference between Spark 2 and Spark Shell
00m
Spark RDD's and DAG
11m
Install JDK 8
03m
What is Spark SQL?
07m
Installing Spark 2.0
02m
Configurations to add Spark 2.0 to the services in the Cluster
13m
Download Datasets and Copy ot HDFS
03m
Spark Session
05m
Read a JSON File
00m
Dataframe and Dataset[T] in Spark 2
13m
Difference between Dataframe Dataset[T] in Spark 2
21m
Read a CSV file
08m
Read a Hive Table
02m
Read from JDBC
01m
Read from a Parquet File
04m
Why you should think of Spark SQL before Spark Core?
02m
Discussion on the Agenda for Next Session
01m
Recap of the Previous Session
05m
Download the Dataset for the Session
04m
Understanding the usage of Typed and Untyped Columns
01m
Usage of Typed Columns using Airport Dataset
20m
Using Spark SQL as a JDBC Server
23m
When to use Spark SQL
02m
Using Spark SQL for Structured Data Processing using Spark 2 Shell-Example
35m
Structured Streaming Example
18m