Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
I have extensive experience in data management and data processing. Over the past few years I saw the data management technology transition into the Big Data ecosystem and I needed to follow suit. I... Read More
The project orientation is very much unique and it helps to understand the real time scenarios most of the industries are dealing with. And there is no limit, one can go through as many projects... Read More
This big data hadoop project aims at being the best possible offline evaluation of a music recommendation system. Any type of algorithm can be used: collaborative filtering, content-based methods, web crawling. By relying on the Million Song Dataset, the data for this big data project is completely open: almost everything is known and possibly available.
What is the task in a few words? You have:
and you must predict the missing half. How much easier can it get?
The most straightforward approach to this task is pure collaborative filtering, but remember that there is a wealth of information available to you through the Million Song Dataset. For Million Song Dataset Download, click this link - labrosa.ee.columbia.edu/millionsong/. Go ahead, explore!
In this big data project, we will continue from a previous hive project "Data engineering on Yelp Datasets using Hadoop tools" and do the entire data processing using spark.
Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.
In this project, we will evaluate and demonstrate how to handle unstructured data using Spark.