Bitcoin Data Mining on AWS Free Tier

Bitcoin Data Mining on AWS Free Tier

Bitcoin Mining on AWS - Learn how to use AWS Cloud for building a data pipeline and analysing bitcoin data.
explanation image


Each project comes with 2-5 hours of micro-videos explaining the solution.

ipython image

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

project experience

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews
profile image

James Peebles linkedin profile url

Data Analytics Leader, IQVIA

This is one of the best of investments you can make with regards to career progression and growth in technological knowledge. I was pointed in this direction by a mentor in the IT world who I highly... Read More

profile image

Shailesh Kurdekar linkedin profile url

Solutions Architect at Capital One

I have worked for more than 15 years in Java and J2EE and have recently developed an interest in Big Data technologies and Machine learning due to a big need at my workspace. I was referred here by a... Read More

What will you learn

Understanding what is data warehouse, hive and spark.
Understanding how to build a data warehouse using hive and spark.
What are the system requirements for the project.
Understanding how to install hadoop on AWS EC2.
Understanding how to install spark on AWS EC2.
Explanation of Data architecture for building data warehouse using hive and spark.
What are the challenges with hive, its optimization and comparison with Presto and Druid.
Understanding what is apache spark.
How we can visualize the using AWS Quicksight.
Understanding how to extract the data using Python API.
Uploading the data from EC2 instance to HDFS.
Performing the Pyspark Analysis and Kryo Serialization.
Data Analysis using Pyspark.
How we can create table using hive in AWS EC2 instance.
Visualizing the data using AWS quicksight.

Project Description

Why Big data?

Real time streaming data being captured at regular intervals of time from millions of IOT devices like sensors, clickstreams, logs from the device APIs and historical data from SQL databases. To store the huge volumes of data with high velocity and veracity, we need an efficient scalable storage system which is distributed across different nodes either in local or in cloud. Here comes the Hadoop concept which can be classified into two groups -Storage and processing. Storage will be done in HDFS and processing is done using Map reduce.

Data Pipeline:

It refers to a system for moving data from one system to another. The data may or may not be transformed, and it may be processed in real time (or streaming) instead of batches. Right from extracting or capturing data using various tools, storing raw data, cleaning, validating data, transforming data into query worthy format, visualisation of KPIs including Orchestration of the above process is data pipeline.

What we are going to do?

We are going to extract data from APIs using Python, parse it, save it to EC2 instance locally after that upload the data onto HDFS. Then reading the data using Pyspark from HDFS and perform analysis. The techniques we are going to use is Kyro serialisation technique and Spark optimisation techniques. An External table is going to be created on Hive/Presto and at last for visualizing the data we are going to use AWS Quicksight.

Similar Projects

In this big data project, we'll work with Apache Airflow and write scheduled workflow, which will download data from Wikipedia archives, upload to S3, process them in HIVE and finally analyze on Zeppelin Notebooks.

In this Databricks Azure tutorial project, you will use Spark Sql to analyse the movielens dataset to provide movie recommendations. As part of this you will deploy Azure data factory, data pipelines and visualise the analysis.

In this hadoop project, we are going to be continuing the series on data engineering by discussing and implementing various ways to solve the hadoop small file problem.

Curriculum For This Mini Project

Introduction to data warehouse hive and spark
Project overview to build data warehouse using hive and spark
System requirements for the Project
Installation of Hadoop
Installation of Spark
Data architechture for building data warehouse using hive and spark
Hive explanation and comparision
Spark explanation
Why to use AWS quicksight for data visualization
Extracting the data using Python API
How to upload the data from Ec2 instance to hdfs
How to Perform Pyspark Analysis and Kryo serialization
Analyzing the data using Pyspark Part 1
Analyzing the data using Pyspark Part 2
Creating table using hive
Data visualization using aws quicksight