Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
I came to the platform with no experience and now I am knowledgeable in Machine Learning with Python. No easy thing I must say, the sessions are challenging and go to the depths. I looked at graduate... Read More
I have worked for more than 15 years in Java and J2EE and have recently developed an interest in Big Data technologies and Machine learning due to a big need at my workspace. I was referred here by a... Read More
In our previous Spark Project-Real-Time Log Processing using Spark Streaming Architecture, we built on a previous topic of log processing by using the speed layer of the lambda architecture. We performed a real time processing of log entries from application using Spark Streaming, storing the final data in a hbase table.
In this kafka project, we will repeat the same objectives using another set of real time technologies. The idea is to compare both approaches of doing real time data processing which will soon become mainstream in various industries.
We will be using Kafka for the streaming architecture in a microservice sense.
The major highlight of this big data project will be students having to compare the spark streaming approach vs the Kafka-only approach. This is a great session for developers, analyst as much as architects.
Note: It is worthy of note that the Cloudera QuickStart VM does not have Kafka. We intend to work around that. So come prepare to do Kafka Installation in Cloudera quickstart vm.
In this big data project, we will see how data ingestion and loading is done with Kafka connect APIs while transformation will be done with Kafka Streaming API.
In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.
The goal of this hadoop project is to apply some data engineering principles to Yelp Dataset in the areas of processing, storage, and retrieval.