Design a Network Crawler by Mining Github Social Profiles

Design a Network Crawler by Mining Github Social Profiles

In this big data project, we will look at how to mine and make sense of connections in a simple way by building a Spark GraphX Algorithm and a Network Crawler.
explanation image


Each project comes with 2-5 hours of micro-videos explaining the solution.

ipython image

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

project experience

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews
profile image

Camille St. Omer linkedin profile url

Artificial Intelligence Researcher, Quora 'Most Viewed Writer in 'Data Mining'

I came to the platform with no experience and now I am knowledgeable in Machine Learning with Python. No easy thing I must say, the sessions are challenging and go to the depths. I looked at graduate... Read More

profile image

Arvind Sodhi linkedin profile url

VP - Data Architect, CDO at Deutsche Bank

I have extensive experience in data management and data processing. Over the past few years I saw the data management technology transition into the Big Data ecosystem and I needed to follow suit. I... Read More

What will you learn

Designing your Github network of persons
Building the network model in HBase
Running your network crawler
Using spark to analyze the network
Running graph algorithms using GraphFrame or Spark GraphX

Project Description

The internet has grown from being a connection of web pages to a connection of people and even things. Famous companies around the world have made name and money by accelerating this connection and communication.

In this big data project, we will look at how to mine and make sense of connections in a simple way - Github. Github has evolved from the beginning just a source version control software to a social coding platform. That social component has increased its relevance in the midst of competition. We can, therefore, apply this learning in our business by not only providing goods or services but always exploring connections among customers.

This exploration journey is what this Spark GraphX project is all about as we will mine the people connection around some Github projects and try to perform some famous graph algorithm on this connection network.Note that this class will be a little code-intensive. 

Similar Projects

In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.

In this hive project, you will design a data warehouse for e-commerce environments.

The goal of this spark project is to analyse the level and strength of interactions across areas of coverage of a telecom provider between different areas in the city of Milan.

Curriculum For This Mini Project

02h 28m
03h 11m