Event Data Analysis using AWS ELK Stack

Event Data Analysis using AWS ELK Stack

This Elasticsearch example deploys the AWS ELK stack to analyse streaming event data. Tools used include Nifi, PySpark, Elasticsearch, Logstash and Kibana for visualisation.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Shailesh Kurdekar

Solutions Architect at Capital One

I have worked for more than 15 years in Java and J2EE and have recently developed an interest in Big Data technologies and Machine learning due to a big need at my workspace. I was referred here by a... Read More

Arvind Sodhi

VP - Data Architect, CDO at Deutsche Bank

I have extensive experience in data management and data processing. Over the past few years I saw the data management technology transition into the Big Data ecosystem and I needed to follow suit. I... Read More

What will you learn

Complex Json real-time streaming data extraction and parsing into csv, storing into HDFS using NiFi
Extract the data from HDFS using PySpark for further analysis using PySparkSQL
Write the processed data back to HDFS to ingest the data into Elasticsearch
Use Logstash to ingest data into Elasticsearch
Discussion about various data ingestion possibilities into Elasticsearch in large scale distributed environments
Discussion of Ws regarding the architecture and tools/services used
Analysis - text search and querying of data once indexed into Elasticsearch using Kibana UI
Visualisation of metrics using Kibana
Creating dashboards in Kibana
Dataflow orchestration using Cron jobs . (Explanation of Ws regarding orchestration compared to Oozie/Airflow for this use case)

Project Description

In this pre-built big data industry project, we extract real time streaming event data from New York City accidents dataset API. We then process the data on AWS to extract KPIs and metrics which will eventually be pushed to Elasticsearch for text based search and analysis using Kibana visualization.

This is an end-to-end project pipeline right from Data extraction - Cleaning - Transformation - Exploratory analysis - Visualisation - Data flow orchestration of event data on the cloud.

Similar Projects

In this NoSQL project, we will use two NoSQL databases(HBase and MongoDB) to store Yelp business attributes and learn how to retrieve this data for processing or query.

In this project, we will look at running various use cases in the analysis of crime data sets using Apache Spark.

Build a fully working scalable, reliable and secure AWS EMR complex data pipeline from scratch that provides support for all data stages from data collection to data analysis and visualization.

Curriculum For This Mini Project

Introduction to event Data Pipelines
System Requirements And Dataset Overview
Solution Architecture
Introduction To Apache Nifi
Introduction To Spark
Introduction To AWS Elk Stack Elasticsearch
Introduction To Kibana
Introduction To Logstash
Extract From Nifi and Parse
Store In HDFS
Running Logstash
Processing In PySpark - Data Exploration
Processing In PySpark - Data Analysis
Writing Analysis Results To HDFS
Ingesting Data With Logstash Into Elasticsearch
Custom SQL In Kibana
Kibana Visualization - Lens UI
Kibana Charts - 1
Kibana Charts - 2
Data Flow Orchestration With Crontab