Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
I have 11 years of experience and work with IBM. My domain is Travel, Hospitality and Banking - both sectors process lots of data. The way the projects were set up and the mentors' explanation was... Read More
I have extensive experience in data management and data processing. Over the past few years I saw the data management technology transition into the Big Data ecosystem and I needed to follow suit. I... Read More
The hype around SQL-on-Hadoop had died down and now people want more from these SQL-on-Hadoop engines. More requirements like real-time queries, support from various file formats, support from user-defined functions and support from various client connectivities.
In this Hackerday, we will take a look at three different SQL-on-Hadoop engines - Hive, Phoenix, Impala, and Presto. While our expectations for hive should be relatively expected, we want to to see what it will take to get to adopt other SQL-on-Hadoop engines in our big data infrastructure.
After this Hackerday session, you should be able to make a choice about these engines, make the choice with a real informed decision and be able to extend these to your data processing infrastructure.
In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.
The goal of this IoT project is to build an argument for generalized streaming architecture for reactive data ingestion based on a microservice architecture.
Analyze clickstream data of a website using Hadoop Hive to increase sales by optimizing every aspect of the customer experience on the website from the first mouse click to the last.