Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
In previous Hackerday sessions, we have introduced how to bring OLAP to extremely large datasets in Apache Kylin. For those who don't know what Kylin is, Kylin (kylin.apache.org) is a Distributed Analytics Engine that provides SQL interface and multidimensional analysis (OLAP) on the large dataset using MapReduce or Spark. This means that I can answer classical aggregate queries in the Hadoop platform with a low latency over billions of records.
In this Hackerday, we will be performing an OLAP cube design using the flight on-time dataset. Since we have previously introduced Kylin, this Hackerday session will look at more involved features like incremental build, performance tuning or consideration tips, we will discuss the Spark engine as well as how to build different types of model.
In this project, we will look at two database platforms - MongoDB and Cassandra and look at the philosophical difference in how these databases work and perform analytical queries.
In this NoSQL project, we will use two NoSQL databases(HBase and MongoDB) to store Yelp business attributes and learn how to retrieve this data for processing or query.
In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.