Build a big data pipeline with AWS Quicksight, Druid, and Hive

Build a big data pipeline with AWS Quicksight, Druid, and Hive

Use the dataset on aviation for analytics to simulate a complex real-world big data pipeline based on messaging with AWS Quicksight, Druid, NiFi, Kafka, and Hive.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Mike Vogt

Information Architect at Bank of America

I have had a very positive experience. The platform is very rich in resources, and the expert was thoroughly knowledgeable on the subject matter - real world hands-on experience. I wish I had this... Read More

Camille St. Omer

Artificial Intelligence Researcher, Quora 'Most Viewed Writer in 'Data Mining'

I came to the platform with no experience and now I am knowledgeable in Machine Learning with Python. No easy thing I must say, the sessions are challenging and go to the depths. I looked at graduate... Read More

What will you learn

End-to-end implementation of Big data pipeline on AWS
Scalable, reliable, secure data architecture followed by top notch Big data leaders
Detailed explanation of V's in Big Data and data pipeline building and automation of the processes
Real time streaming data import from external API using NiFi
Build both Batch and streaming data pipeline on AWS from NiFi
Write the data into HDFS (batch) and Kafka(streaming ingestion) using NiFi
Ingest the data into Druid using HDFS(batch ingestion) as well as Kafka( real time)
Compare the performance of Druid or Hive
Discuss limitations and opportunities with Druid and Hive
Hive external table creation on top of HDFS data
Performing ETLs which are widely used in the industry on top of Hive data and storing into managed table
Visualising Hive data using AWS Quicksight to calculate some of the KPIs in Aviation data

Project Description

In this Big Data project, a senior Big Data Architect will demonstrate how to implement a Big Data pipeline on AWS at scale. You will be using the Aviation dataset. Analyse Aviation data using highly competitive technology big data stack such as NiFi, Kafka, HDFS ,Hive, Druid, AWS quicksight to derive metrics out of the existing data . Big data pipelines built on AWS to serve both batch and real time streaming ingestions of the data for various consumers according to their needs . This project is highly scalable and implemented on a very large scale organisation set up .

Similar Projects

Build a fully working scalable, reliable and secure AWS EMR complex data pipeline from scratch that provides support for all data stages from data collection to data analysis and visualization.

In this Databricks Azure tutorial project, you will use Spark Sql to analyse the movielens dataset to provide movie recommendations. As part of this you will deploy Azure data factory, data pipelines and visualise the analysis.

In this project, we will look at Cassandra and how it is suited for especially in a hadoop environment, how to integrate it with spark, installation in our lab environment.

Curriculum For This Mini Project

Introduction to building pipeline using Druid Hive and Quicksight
Introduction to Big Data
Introduction to Big Data Pipeline
System Requirements
Data Architecture using Nifi Kafka Hive and Druid
Introduction to Apache Nifi
Apache Kafka vs Apache Flume
Apache Hive optimization techniques
Druid architecture and comparison with Hive and Presto
Exploration of Dataset
Extracting Data using Nifi into HDFS Kafka and MySQL
Configuring HDFS and Druid
Ingesting data from HDFS into Druid
Writing data from Nifi into Kafka
Consume data from Kafka to Druid
Compare query performance in Hive and Druid and MySQL
Compare query performance using MySQL
Connecting MySQL to AWS QuickSight for Visualization