1-844-696-6465 (US)        +91 77600 44484        help@dezyre.com

Design a Network Crawler by Mining Github Social Profiles

In this big data project, we will look at how to mine and make sense of connections in a simple way by building a Spark GraphX Algorithm and a Network Crawler.

Users who bought this project also bought

What will you learn

  • Designing your Github network of persons
  • Building the network model in HBase
  • Running your network crawler
  • Using spark to analyze the network
  • Running graph algorithms using GraphFrame or Spark GraphX

What will you get

  • Access to recording of the complete project
  • Access to all material related to project like data files, solution files etc.


  • General knowledge of Big Data, Hadoop and spark is required
  • Knowledge of NoSQL (HBase) is necessary but not compulsory
  • Download and setup of the Cloudera Quickstart VM

Project Description

The internet has grown from being a connection of web pages to a connection of people and even things. Famous companies around the world have made name and money by accelerating this connection and communication.

In this big data project, we will look at how to mine and make sense of connections in a simple way - Github. Github has evolved from the beginning just a source version control software to a social coding platform. That social component has increased its relevance in the midst of competition. We can, therefore, apply this learning in our business by not only providing goods or services but always exploring connections among customers.

This exploration journey is what this Spark GraphX project is all about as we will mine the people connection around some Github projects and try to perform some famous graph algorithm on this connection network.Note that this class will be a little code-intensive. 



Big Data & Enterprise Software Engineer

I am passionate about software development, databases, data analysis and the android platform. My native language is java but no one has stopped me so far from learning and using angular and node.js. Data and data analysis is thrilling and so are my experiences with SQL on Oracle, Microsoft SQL Server, Postgres and MyS see more...

Curriculum For This Mini Project

02h 28m
03h 11m