Integrating Spark and NoSQL Database for Data Analysis

Integrating Spark and NoSQL Database for Data Analysis

In this project, we will look at two database platforms - MongoDB and Cassandra and look at the philosophical difference in how these databases work and perform analytical queries.
explanation image

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

ipython image

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

project experience

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews
profile image

Shailesh Kurdekar linkedin profile url

Solutions Architect at Capital One

I have worked for more than 15 years in Java and J2EE and have recently developed an interest in Big Data technologies and Machine learning due to a big need at my workspace. I was referred here by a... Read More

profile image

Hiren Ahir linkedin profile url

Microsoft Azure SQL Sever Developer, BI Developer

I'm a Graduate student and came into the job market and found a university degree wasn't sufficient to get a good paying job. I aimed at hottest technology in the market Big Data but the word BigData... Read More

What will you learn

Introduction to NoSQL Document store - MongoDB
Introduction to NoSQL Wide column store - Cassandra
Use cases for Spark storage to NoSQL databases
Spark I/O connectors
Querying our NoSQL databases

Project Description

Spark has a benefit of being very extensible to quite a number of storage platforms beyond Hadoop. This means that as spark developers, we can write and read from virtually any popular storage platform while building our data pipeline.
In this Hackerday, we will look at two such database platforms - MongoDB and Cassandra. These are two different databases or classes and have their use suited for different use cases. We will discuss these and install both platforms in our lab environment, look at the philosophical difference in how these databases work, create sample tables and finally integrate our spark application to load the UK MOT vehicle testing dataset into them. Once loaded, anyone can at any time, perform analytical queries on the tables.

Similar Projects

The goal of this IoT project is to build an argument for generalized streaming architecture for reactive data ingestion based on a microservice architecture. 

In this Spark project, we are going to bring processing to the speed layer of the lambda architecture which opens up capabilities to monitor application real time performance, measure real time comfort with applications and real time alert in case of security

In this project, we will show how to build an ETL pipeline on streaming datasets using Kafka.

Curriculum For This Mini Project