Build a big data pipeline with AWS Quicksight, Druid, and Hive

Build a big data pipeline with AWS Quicksight, Druid, and Hive

Use the dataset on aviation for analytics to simulate a complex real-world big data pipeline based on messaging with AWS Quicksight, Druid, NiFi, Kafka, and Hive.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Swati Patra

Systems Advisor , IBM

I have 11 years of experience and work with IBM. My domain is Travel, Hospitality and Banking - both sectors process lots of data. The way the projects were set up and the mentors' explanation was... Read More


Lead Consultant, ITC Infotech

The project orientation is very much unique and it helps to understand the real time scenarios most of the industries are dealing with. And there is no limit, one can go through as many projects... Read More

What will you learn

End-to-end implementation of Big data pipeline on AWS
Scalable, reliable, secure data architecture followed by top notch Big data leaders
Detailed explanation of V's in Big Data and data pipeline building and automation of the processes
Real time streaming data import from external API using NiFi
Build both Batch and streaming data pipeline on AWS from NiFi
Write the data into HDFS (batch) and Kafka(streaming ingestion) using NiFi
Ingest the data into Druid using HDFS(batch ingestion) as well as Kafka( real time)
Compare the performance of Druid or Hive
Discuss limitations and opportunities with Druid and Hive
Hive external table creation on top of HDFS data
Performing ETLs which are widely used in the industry on top of Hive data and storing into managed table
Visualising Hive data using AWS Quicksight to calculate some of the KPIs in Aviation data

Project Description

In this Big Data project, a senior Big Data Architect will demonstrate how to implement a Big Data pipeline on AWS at scale. You will be using the Aviation dataset. Analyse Aviation data using highly competitive technology big data stack such as NiFi, Kafka, HDFS ,Hive, Druid, AWS quicksight to derive metrics out of the existing data . Big data pipelines built on AWS to serve both batch and real time streaming ingestions of the data for various consumers according to their needs . This project is highly scalable and implemented on a very large scale organisation set up .

Similar Projects

In this big data project, we'll work with Apache Airflow and write scheduled workflow, which will download data from Wikipedia archives, upload to S3, process them in HIVE and finally analyze on Zeppelin Notebooks.

Build a fully working scalable, reliable and secure AWS EMR complex data pipeline from scratch that provides support for all data stages from data collection to data analysis and visualization.

In this hadoop project, you will be using a sample application log file from an application server to a demonstrated scaled-down server log processing pipeline.

Curriculum For This Mini Project

Introduction to building pipeline using Druid Hive and Quicksight
Introduction to Big Data
Introduction to Big Data Pipeline
System Requirements
Data Architecture using Nifi Kafka Hive and Druid
Introduction to Apache Nifi
Apache Kafka vs Apache Flume
Apache Hive optimization techniques
Druid architecture and comparison with Hive and Presto
Exploration of Dataset
Extracting Data using Nifi into HDFS Kafka and MySQL
Configuring HDFS and Druid
Ingesting data from HDFS into Druid
Writing data from Nifi into Kafka
Consume data from Kafka to Druid
Compare query performance in Hive and Druid and MySQL
Compare query performance using MySQL
Connecting MySQL to AWS QuickSight for Visualization