Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
I have extensive experience in data management and data processing. Over the past few years I saw the data management technology transition into the Big Data ecosystem and I needed to follow suit. I... Read More
I came to the platform with no experience and now I am knowledgeable in Machine Learning with Python. No easy thing I must say, the sessions are challenging and go to the depths. I looked at graduate... Read More
The use of Hive or the hive meta store is so ubiquitous in big data engineering that achieving efficient use of the tool is a factor in the success of many projects. Whether in integrating with Spark or using hive as an ETL tool, many projects either fail or succeed as they grow in scale and complexity because of decisions made early in the project.
In this big data project on hive, we will explore using hive efficiently and this hive porject format will take an exploratory pattern rather than a project building pattern. The goal of this big data project is to explore Hive in uncommon ways towards mastery.
We will be using different datasets in this sessions, exploring different Hadoop file formats like text, CSV, JSON, ORC, Parquet, Avro, and SequenceFile, will look at compression and different codecs and take a look at the performance of each when you try integration with either spark or impala.
The idea is to explore enough so that we can make a reasonable argument about what to do or not in any given big scenario.
In this big data project, we will continue from a previous hive project "Data engineering on Yelp Datasets using Hadoop tools" and do the entire data processing using spark.
In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.
In this hive project , we will build a Hive data warehouse from a raw dataset stored in HDFS and present the data in a relational structure so that querying the data will be natural.