Event Data Analysis using AWS ELK Stack

Event Data Analysis using AWS ELK Stack

This Elasticsearch example deploys the AWS ELK stack to analyse streaming event data. Tools used include Nifi, PySpark, Elasticsearch, Logstash and Kibana for visualisation.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

What will you learn

Complex Json real-time streaming data extraction and parsing into csv, storing into HDFS using NiFi
Extract the data from HDFS using PySpark for further analysis using PySparkSQL
Write the processed data back to HDFS to ingest the data into Elasticsearch
Use Logstash to ingest data into Elasticsearch
Discussion about various data ingestion possibilities into Elasticsearch in large scale distributed environments
Discussion of Ws regarding the architecture and tools/services used
Analysis - text search and querying of data once indexed into Elasticsearch using Kibana UI
Visualisation of metrics using Kibana
Creating dashboards in Kibana
Dataflow orchestration using Cron jobs . (Explanation of Ws regarding orchestration compared to Oozie/Airflow for this use case)

Project Description

In this pre-built big data industry project, we extract real time streaming event data from New York City accidents dataset API. We then process the data on AWS to extract KPIs and metrics which will eventually be pushed to Elasticsearch for text based search and analysis using Kibana visualization.

This is an end-to-end project pipeline right from Data extraction - Cleaning - Transformation - Exploratory analysis - Visualisation - Data flow orchestration of event data on the cloud.

Similar Projects

PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

In this big data project, we will talk about Apache Zeppelin. We will write code, write notes, build charts and share all in one single data analytics environment using Hive, Spark and Pig.

In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.

Curriculum For This Mini Project

Introduction to event Data Pipelines
12m
System Requirements And Dataset Overview
03m
Solution Architecture
02m
Introduction To Apache Nifi
06m
Introduction To Spark
02m
Introduction To AWS Elk Stack Elasticsearch
08m
Introduction To Kibana
10m
Introduction To Logstash
05m
Extract From Nifi and Parse
11m
Store In HDFS
03m
Running Logstash
10m
Processing In PySpark - Data Exploration
10m
Processing In PySpark - Data Analysis
07m
Writing Analysis Results To HDFS
02m
Ingesting Data With Logstash Into Elasticsearch
03m
Custom SQL In Kibana
10m
Kibana Visualization - Lens UI
05m
Kibana Charts - 1
06m
Kibana Charts - 2
06m
Data Flow Orchestration With Crontab
05m