1-844-696-6465 (US)        +91 77600 44484        help@dezyre.com

Data processing with Spark SQL

In this Apache Spark SQL project, we will go through provisioning data for retrieval using Spark SQL.

Users who bought this project also bought

What will you learn

  • Spark SQL
  • Defining the dataframe schema
  • Saving final result in different formats
  • Setting up the spark SQL thrift server
  • Performance tuning
  • Benchmarking queries in Hive, Spark SQL, and impala

What will you get

  • Access to recording of the complete project
  • Access to all material related to project like data files, solution files etc.

Project Description

Spark SQL offers the platform to provide a structured data to any dataset regardless its source or form. And once that structured data is formed, it can be queried using tools like Hive, Impala, and other Hadoop data warehouse tools.

In this spark project, we will go through Spark SQL syntax to process the dataset, perform some joins with other supplementary data as well as make the data available for the query using the Spark SQL thrift server. On provision of the data, we will perform some interesting query and other go through some performance tuning technique for Spark SQL.

Curriculum For This Mini Project

 
  Introduction to the Project
01m
  Starting Cloudera Quickstart VM
00m
  Overview on the Datasets used for the Project
04m
  What is Spark?
17m
  Introduction to Directed Acyclic Graph (DAG)
03m
  Introduction to RDD's in Spark
06m
  RDD's in Action
04m
  Transformations in RDD's
09m
  Example on how Spark works
05m
  Introduction to Spark Streaming Module
04m
  Introduction to Spark MLlib
01m
  Introduction to GraphX
04m
  Introduction to Spark SQL
04m
  Example on how Spark SQL works
05m
  Read JSON File and Create RDD's
09m
  How to define schema in Spark SQL?
07m
  Creating an RDD from Movie Dataset
03m
  Converting the RDD into a Dataframe
03m
  Defining the Dataframe Schema-Build Schemas using Million Song Dataset
17m
  Read Million Song Dataset CSV File
21m
  Loading Data
05m
  Working with Dataframes
01h 00m
  Working with Crime Dataset
01m
  Start Spark Shell and Connect to Hive
08m
  Hive Querying using Spart Context
03m
  Read a file and Save data as Parquet
08m
  Load Data
03m
  Streaming Data
04m
  Setting up the Spark SQL Thrift Server
09m