Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
Data engineering is the science of acquiring, aggregating or collection, processing, and storage of data either in batch or in real-time as well as providing the variety of means of serving these data to other users which could include a data scientist. It involves software engineering practices on big data.
The goal of this big data project is apply data engineering principles to the Yelp Dataset in the areas of processing, storage, and retrieval. We will not include data ingestion since we are already downloading the data from the yelp challenge website.
Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.
In this hive project , we will build a Hive data warehouse from a raw dataset stored in HDFS and present the data in a relational structure so that querying the data will be natural.
The goal of this apache kafka project is to process log entries from applications in real-time using Kafka for the streaming architecture in a microservice sense.