Design a Network Crawler by Mining Github Social Profiles

In this big data project, we will look at how to mine and make sense of connections in a simple way by building a Spark GraphX Algorithm and a Network Crawler.
Videos
Each project comes with 2-5 hours of micro-videos explaining the solution.
Code & Dataset
Get access to 50+ solved projects with iPython notebooks and datasets.
Project Experience
Add project experience to your Linkedin/Github profiles.

What will you learn

  • Designing your Github network of persons
  • Building the network model in HBase
  • Running your network crawler
  • Using spark to analyze the network
  • Running graph algorithms using GraphFrame or Spark GraphX

Project Description

The internet has grown from being a connection of web pages to a connection of people and even things. Famous companies around the world have made name and money by accelerating this connection and communication.

In this big data project, we will look at how to mine and make sense of connections in a simple way - Github. Github has evolved from the beginning just a source version control software to a social coding platform. That social component has increased its relevance in the midst of competition. We can, therefore, apply this learning in our business by not only providing goods or services but always exploring connections among customers.

This exploration journey is what this Spark GraphX project is all about as we will mine the people connection around some Github projects and try to perform some famous graph algorithm on this connection network.Note that this class will be a little code-intensive. 

Curriculum For This Mini Project

 
  3-Feb-2018
02h 28m
  4-Feb-2018
03h 11m