Event Data Analysis using AWS ELK Stack

Event Data Analysis using AWS ELK Stack

This Elasticsearch example deploys the AWS ELK stack to analyse streaming event data. Tools used include Nifi, PySpark, Elasticsearch, Logstash and Kibana for visualisation.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Arvind Sodhi

VP - Data Architect, CDO at Deutsche Bank

I have extensive experience in data management and data processing. Over the past few years I saw the data management technology transition into the Big Data ecosystem and I needed to follow suit. I... Read More

Nathan Elbert

Senior Data Scientist at Tiger Analytics

This was great. The use of Jupyter was great. Prior to learning Python I was a self taught SQL user with advanced skills. I hold a Bachelors in Finance and have 5 years of business experience.. I... Read More

What will you learn

Complex Json real-time streaming data extraction and parsing into csv, storing into HDFS using NiFi
Extract the data from HDFS using PySpark for further analysis using PySparkSQL
Write the processed data back to HDFS to ingest the data into Elasticsearch
Use Logstash to ingest data into Elasticsearch
Discussion about various data ingestion possibilities into Elasticsearch in large scale distributed environments
Discussion of Ws regarding the architecture and tools/services used
Analysis - text search and querying of data once indexed into Elasticsearch using Kibana UI
Visualisation of metrics using Kibana
Creating dashboards in Kibana
Dataflow orchestration using Cron jobs . (Explanation of Ws regarding orchestration compared to Oozie/Airflow for this use case)

Project Description

In this pre-built big data industry project, we extract real time streaming event data from New York City accidents dataset API. We then process the data on AWS to extract KPIs and metrics which will eventually be pushed to Elasticsearch for text based search and analysis using Kibana visualization.

This is an end-to-end project pipeline right from Data extraction - Cleaning - Transformation - Exploratory analysis - Visualisation - Data flow orchestration of event data on the cloud.

Similar Projects

In this project, we will use complex scenarios to make Spark developers better to deal with the issues that come in the real world.

In this Databricks Azure project, you will use Spark & Parquet file formats to analyse the Yelp reviews dataset. As part of this you will deploy Azure data factory, data pipelines and visualise the analysis.

In this project, we will be building and querying an OLAP Cube for Flight Delays on the Hadoop platform.

Curriculum For This Mini Project

Introduction to event Data Pipelines
System Requirements And Dataset Overview
Solution Architecture
Introduction To Apache Nifi
Introduction To Spark
Introduction To AWS Elk Stack Elasticsearch
Introduction To Kibana
Introduction To Logstash
Extract From Nifi and Parse
Store In HDFS
Running Logstash
Processing In PySpark - Data Exploration
Processing In PySpark - Data Analysis
Writing Analysis Results To HDFS
Ingesting Data With Logstash Into Elasticsearch
Custom SQL In Kibana
Kibana Visualization - Lens UI
Kibana Charts - 1
Kibana Charts - 2
Data Flow Orchestration With Crontab