Each project comes with 2-5 hours of micro-videos explaining the solution.
Code & Dataset
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
Understanding the problem statement
Understanding what is real-time data processing
Architecture and data flow in Big data project
Basic EDA of the dataset and understanding the required format of the output
Understanding the tools required for Big Data project
Kafka's role as the messenger and the use of zookeeper
Setting up a virtual environment in your computer and connecting Kafka, Spark, HBase, and Hadoop
Creating Data simulation demo and running the demo
Creating and using your won zookeeper
Testing Hbase and streaming directly to Hbase using Spark Shell
Initiating Spark Steaming to fetch data
Analyzing the data on Spark steaming using the grouping method to fetch insights
Visualizing dashboard after Kafkas sends the message and realtime change in the Dasboard
Visualizing the final output using Pie Charts
Understanding other alternatives for Real tie Data analytics like Apache Hadoop and Spark RDD
Understanding Kafka consumer, how it works and creating parallel threads for the Kafka consumer
In this spark project, we will embark on real-time data collection and aggregation from a simulated real-time system.
The dataset for the project which will simulate our sensor data delivery is from Microsoft Research Asia GeoLife project. According to the paper, the dataset recoded a broad range of users’ outdoor movements, including not only life routines like go home and go to work but also some entertainments and sports activities, such as shopping, sightseeing, dining, hiking, and cycling. This trajectory dataset can be used in many research fields, such as mobility pattern mining, user activity recognition, location-based social networks, location privacy, and location recommendation.
As a part of this big data project, we will use the data to provide real time aggregates of the movements along a number of dimension like effective distance, duration, trajectories and more. All streamed data will be stored in the NoSQL database - HBase.