Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
I have worked for more than 15 years in Java and J2EE and have recently developed an interest in Big Data technologies and Machine learning due to a big need at my workspace. I was referred here by a... Read More
I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More
Still on the series on Data engineering using Yelp dataset, we have established several concepts - from data warehousing to graph analysis. Well done.
But in today's world, not all data are best stored on HDFS. Some special requirements and scenario could require a data storage with a very low latency that could also handle large dataset. Here comes the use of NoSQL databases.
In this NoSQL project, we will use two NoSQL databases(HBase and MongoDB) to store Yelp business attributes and also learn how to retrieve these data for processing or query. We will substantiate the value of these other ways to store data over using HDFS and how to join them with data stored in HDFS in real time.
Seeing that MongoDB is not available in Cloudera Quickstart VM, we are encouraged to install MongoDB on our host machine while setting up a host network interface between the host and the VM for this big data project.
Learn to design Hadoop Architecture and understand how to store data using data acquisition tools in Hadoop.
In this hive project , we will build a Hive data warehouse from a raw dataset stored in HDFS and present the data in a relational structure so that querying the data will be natural.
In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.