Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
Recently I became interested in Hadoop as I think its a great platform for storing and analyzing large structured and unstructured data sets. The experts did a great job not only explaining the... Read More
I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More
This big data hadoop project aims at being the best possible offline evaluation of a music recommendation system. Any type of algorithm can be used: collaborative filtering, content-based methods, web crawling. By relying on the Million Song Dataset, the data for this big data project is completely open: almost everything is known and possibly available.
What is the task in a few words? You have:
and you must predict the missing half. How much easier can it get?
The most straightforward approach to this task is pure collaborative filtering, but remember that there is a wealth of information available to you through the Million Song Dataset. For Million Song Dataset Download, click this link - labrosa.ee.columbia.edu/millionsong/. Go ahead, explore!
In this hive project, you will work on denormalizing the JSON data and create HIVE scripts with ORC file format.
Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.
Analyze clickstream data of a website using Hadoop Hive to increase sales by optimizing every aspect of the customer experience on the website from the first mouse click to the last.