1-844-696-6465 (US)        +91 77600 44484        help@dezyre.com
spark-sql-on-spark-2.jpg

Explore features of Spark SQL in practice on Spark 2.0

The goal of this spark project for students is to explore the features of Spark SQL in practice on the latest version of Spark i.e. Spark 2.0.

Users who bought this project also bought

What will you learn

  • What is Spark SQL
  • Why you should think Spark SQL before Spark Core
  • When you are going to have to use Spark Core
  • Spark SQL and multiple file types: Text File, JSON File, RDBMS Sources, NoSQL Sources
  • Spark SQL for SQL-on-Hadoop server
  • Introduction to Spark Structured Streaming

What will you get

  • Access to recording of the complete project
  • Access to all material related to project like data files, solution files etc.

Prerequisites

  • It is expected that students have a fair knowledge of Big Data and hadoop particularly HDFS, Spark and Hive.
  • Installation Cloudera QuickStart VM.
  • Since we will be doing the development in the Quickstart VM, it is essential to have the Scala SDK installed there as well. Instruction on how to set up a Scala SDK and runtime can be found at here.
  • In the class, we will do an installation of Spark 2 in the Cloudera Quickstart VM. By default, the VM comes pre-installed with Spark 1.6.x.

Project Description

Spark 2 offers a huge but yet backward-compatible break from the Spark 1.x, not only in terms of high-level API but also in performance. And spark the module with the most significant new features is Spark SQL.

In this apache spark project, we will explore a number of this features in practice.

We will discuss using various dataset, the new unified spark API as well as the optimization features that makes Spark SQL the first way to explore in processing structured data.

However, there are times when it is inevitable to resort to Spark Core - RDD in Spark 2. We will explore that as well alongside the newest and cool structured streaming API that enables fault-tolerant stream processing engine built on the Spark SQL engine.

Instructors

 
Michael

Big Data & Enterprise Software Engineer

I am passionate about software development, databases, data analysis and the android platform. My native language is java but no one has stopped me so far from learning and using angular and node.js. Data and data analysis is thrilling and so are my experiences with SQL on Oracle, Microsoft SQL Server, Postgres and MyS see more...

Curriculum For This Mini Project

 
  Project Overview
01m
  Manual Installation of Spark 2 on Cloudera Quickstart VM
05m
  Introduction to Spark
10m
  Difference between Spark 2 and Spark Shell
00m
  Spark RDD's and DAG
11m
  Install JDK 8
03m
  What is Spark SQL?
07m
  Installing Spark 2.0
02m
  Configurations to add Spark 2.0 to the services in the Cluster
13m
  Download Datasets and Copy ot HDFS
03m
  Spark Session
05m
  Read a JSON File
00m
  Dataframe and Dataset[T] in Spark 2
13m
  Difference between Dataframe Dataset[T] in Spark 2
21m
  Read a CSV file
08m
  Read a Hive Table
02m
  Read from JDBC
01m
  Read from a Parquet File
04m
  Why you should think of Spark SQL before Spark Core?
02m
  Discussion on the Agenda for Next Session
01m
  Recap of the Previous Session
05m
  Download the Dataset for the Session
04m
  Understanding the usage of Typed and Untyped Columns
01m
  Usage of Typed Columns using Airport Dataset
20m
  Using Spark SQL as a JDBC Server
23m
  When to use Spark SQL
02m
  Using Spark SQL for Structured Data Processing using Spark 2 Shell-Example
35m
  Structured Streaming Example
18m