Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More
I have extensive experience in data management and data processing. Over the past few years I saw the data management technology transition into the Big Data ecosystem and I needed to follow suit. I... Read More
In our last Hackerday, we demonstrated how OLAP analysis with real-time queries can be archived using Apache Kylin.
In this Hackerday, we will look at another angle to it - streaming dataset.
First of all, Apache Kylin (kylin.apache.org) is a Distributed Analytics Engine that provides SQL interface and multidimensional analysis (OLAP) on the large dataset using MapReduce or Spark. This means that I can answer classical MDX questions in the Hadoop platform with a decent amount of latency. Apache Kylin has recorded brilliant performance of delivering results in sub-seconds response to analytical or aggregation queries.
This time, we are doing the same over a streaming dataset. Our dataset will be a simulated stream of a log file using Kafka and we intend to build a Kylin cube over the streaming dataset. At the end of the class, we will be able to write analytical queries will Kylin receive newer data from the Kafka topic.
In this project, we will show how to build an ETL pipeline on streaming datasets using Kafka.
Hadoop Projects for Beginners -Learn data ingestion from a source using Apache Flume and Kafka to make a real-time decision on incoming data.
The goal of this IoT project is to build an argument for generalized streaming architecture for reactive data ingestion based on a microservice architecture.