Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
This big data hadoop project aims at being the best possible offline evaluation of a music recommendation system. Any type of algorithm can be used: collaborative filtering, content-based methods, web crawling. By relying on the Million Song Dataset, the data for this big data project is completely open: almost everything is known and possibly available.
What is the task in a few words? You have:
and you must predict the missing half. How much easier can it get?
The most straightforward approach to this task is pure collaborative filtering, but remember that there is a wealth of information available to you through the Million Song Dataset. For Million Song Dataset Download, click this link - labrosa.ee.columbia.edu/millionsong/. Go ahead, explore!
In this project, we will show how to build an ETL pipeline on streaming datasets using Kafka.
In this hadoop project, we are going to be continuing the series on data engineering by discussing and implementing various ways to solve the hadoop small file problem.
In this project, we will evaluate and demonstrate how to handle unstructured data using Spark.