Event Data Analysis using AWS ELK Stack

This Elasticsearch example deploys the AWS ELK stack to analyse streaming event data. Tools used include Nifi, PySpark, Elasticsearch, Logstash and Kibana for visualisation.


What will you learn

Complex Json real-time streaming data extraction and parsing into csv, storing into HDFS using NiFi
Extract the data from HDFS using PySpark for further analysis using PySparkSQL
Write the processed data back to HDFS to ingest the data into Elasticsearch
Use Logstash to ingest data into Elasticsearch
Discussion about various data ingestion possibilities into Elasticsearch in large scale distributed environments
Discussion of Ws regarding the architecture and tools/services used
Analysis - text search and querying of data once indexed into Elasticsearch using Kibana UI
Visualisation of metrics using Kibana
Creating dashboards in Kibana
Dataflow orchestration using Cron jobs . (Explanation of Ws regarding orchestration compared to Oozie/Airflow for this use case)

Project Description

In this pre-built big data industry project, we extract real time streaming event data from New York City accidents dataset API. We then process the data on AWS to extract KPIs and metrics which will eventually be pushed to Elasticsearch for text based search and analysis using Kibana visualization.

This is an end-to-end project pipeline right from Data extraction - Cleaning - Transformation - Exploratory analysis - Visualisation - Data flow orchestration of event data on the cloud.

Curriculum For This Mini Project

Introduction to event Data Pipelines
System Requirements And Dataset Overview
Solution Architecture
Introduction To Apache Nifi
Introduction To Spark
Introduction To AWS Elk Stack Elasticsearch
Introduction To Kibana
Introduction To Logstash
Extract From Nifi and Parse
Store In HDFS
Running Logstash
Processing In PySpark - Data Exploration
Processing In PySpark - Data Analysis
Writing Analysis Results To HDFS
Ingesting Data With Logstash Into Elasticsearch
Custom SQL In Kibana
Kibana Visualization - Lens UI
Kibana Charts - 1
Kibana Charts - 2
Data Flow Orchestration With Crontab