Event Data Analysis using AWS ELK Stack

Event Data Analysis using AWS ELK Stack

This Elasticsearch example deploys the AWS ELK stack to analyse streaming event data. Tools used include Nifi, PySpark, Elasticsearch, Logstash and Kibana for visualisation.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

What will you learn

Complex Json real-time streaming data extraction and parsing into csv, storing into HDFS using NiFi
Extract the data from HDFS using PySpark for further analysis using PySparkSQL
Write the processed data back to HDFS to ingest the data into Elasticsearch
Use Logstash to ingest data into Elasticsearch
Discussion about various data ingestion possibilities into Elasticsearch in large scale distributed environments
Discussion of Ws regarding the architecture and tools/services used
Analysis - text search and querying of data once indexed into Elasticsearch using Kibana UI
Visualisation of metrics using Kibana
Creating dashboards in Kibana
Dataflow orchestration using Cron jobs . (Explanation of Ws regarding orchestration compared to Oozie/Airflow for this use case)

Project Description

In this pre-built big data industry project, we extract real time streaming event data from New York City accidents dataset API. We then process the data on AWS to extract KPIs and metrics which will eventually be pushed to Elasticsearch for text based search and analysis using Kibana visualization.

This is an end-to-end project pipeline right from Data extraction - Cleaning - Transformation - Exploratory analysis - Visualisation - Data flow orchestration of event data on the cloud.

Similar Projects

PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

In this big data project, we will talk about Apache Zeppelin. We will write code, write notes, build charts and share all in one single data analytics environment using Hive, Spark and Pig.

In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.

Curriculum For This Mini Project

Introduction to event Data Pipelines
System Requirements And Dataset Overview
Solution Architecture
Introduction To Apache Nifi
Introduction To Spark
Introduction To AWS Elk Stack Elasticsearch
Introduction To Kibana
Introduction To Logstash
Extract From Nifi and Parse
Store In HDFS
Running Logstash
Processing In PySpark - Data Exploration
Processing In PySpark - Data Analysis
Writing Analysis Results To HDFS
Ingesting Data With Logstash Into Elasticsearch
Custom SQL In Kibana
Kibana Visualization - Lens UI
Kibana Charts - 1
Kibana Charts - 2
Data Flow Orchestration With Crontab