Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
I have worked for more than 15 years in Java and J2EE and have recently developed an interest in Big Data technologies and Machine learning due to a big need at my workspace. I was referred here by a... Read More
Initially, I was unaware of how this would cater to my career needs. But when I stumbled through the reviews given on the website. I went through many of them and found them all positive. I would... Read More
The internet has grown from being a connection of web pages to a connection of people and even things. Famous companies around the world have made name and money by accelerating this connection and communication.
In this big data project, we will look at how to mine and make sense of connections in a simple way - Github. Github has evolved from the beginning just a source version control software to a social coding platform. That social component has increased its relevance in the midst of competition. We can, therefore, apply this learning in our business by not only providing goods or services but always exploring connections among customers.
This exploration journey is what this Spark GraphX project is all about as we will mine the people connection around some Github projects and try to perform some famous graph algorithm on this connection network.Note that this class will be a little code-intensive.
In this spark project, we will continue building the data warehouse from the previous project Yelp Data Processing Using Spark And Hive Part 1 and will do further data processing to develop diverse data products.
Explore hive usage efficiently in this hadoop hive project using various file formats such as JSON, CSV, ORC, AVRO and compare their relative performances
Learn to design Hadoop Architecture and understand how to store data using data acquisition tools in Hadoop.