1-844-696-6465 (US)        +91 77600 44484        help@dezyre.com
building-data-warehouse-using-apache-spark-hive.jpg

Building a Data warehouse using Spark on Hive

In this project we will build a Hive data warehouse from a raw dataset stored in HDFS and present the data in a relational structure so that querying the data will be natural.
4.84.8

Users who bought this project also bought

What will you learn

  • • How to run hive queries on Spark
  • • Hadoop data warehousing with Hive
  • • Using the interactive Scala Build Tool (sbt) with Spark
  • • Data serialization with kryo serialization example
  • • Performance optimization using caching.
  • • Broadcast variables
  • • Writing spark RDD to Hive using Spark SQL
  • • Explore parquet data storage format and reasons for choosing parquet.
  • • Building Hive external tables using parquet dataset
  • • Writing queries against datasets using impala.

What will you get

  • Access to recording of the complete project
  • Access to all material related to project like data files, solution files etc.

Project Description

This project aims to build a Hive data warehouse from a raw dataset stored in HDFS and present the data in a relational structure so that querying the data will is natural. The dataset set for this project is from the movielens open dataset on movie ratings.

The project makes use of some advance concepts in Sparks programming and also stores it final output incrementally in Hive tables built using the parquet data storage format. We will also demostrate some complex queries on this tables using Hive and impala. The spark application will be written in scala and the development process will be automated using the Scala Build tool(sbt).

The data warehouse is built by loading, extracting and transforming the dataset into structures that will provide the basis for data scientists to perform different forms of model discovery.

We will use following tools in this project:

Instructors

 
Michael

Big Data & Enterprise Software Engineer

I am passionate about software development, databases, data analysis and the android platform. My native language is java but no one has stopped me so far from learning and using angular and node.js. Data and data analysis is thrilling and so are my experiences with SQL on Oracle, Microsoft SQL Server, Postgres and MyS see more...