Integrating Spark and NoSQL Database for Data Analysis

Integrating Spark and NoSQL Database for Data Analysis

In this project, we will look at two database platforms - MongoDB and Cassandra and look at the philosophical difference in how these databases work and perform analytical queries.
explanation image


Each project comes with 2-5 hours of micro-videos explaining the solution.

ipython image

Code & Dataset

Get access to 102+ solved projects with iPython notebooks and datasets.

project experience

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

profile image

Swati Patra linkedin profile url

Systems Advisor , IBM

I have 11 years of experience and work with IBM. My domain is Travel, Hospitality and Banking - both sectors process lots of data. The way the projects were set up and the mentors' explanation was... Read More

profile image

Dhiraj Tandon linkedin profile url

Solution Architect-Cyber Security at ColorTokens

My Interaction was very short but left a positive impression. I enrolled and asked for a refund since I could not find the time. What happened next: They initiated Refund immediately. Their... Read More

What will you learn

Introduction to NoSQL Document store - MongoDB
Introduction to NoSQL Wide column store - Cassandra
Use cases for Spark storage to NoSQL databases
Spark I/O connectors
Querying our NoSQL databases

Project Description

Spark has a benefit of being very extensible to quite a number of storage platforms beyond Hadoop. This means that as spark developers, we can write and read from virtually any popular storage platform while building our data pipeline.
In this Hackerday, we will look at two such database platforms - MongoDB and Cassandra. These are two different databases or classes and have their use suited for different use cases. We will discuss these and install both platforms in our lab environment, look at the philosophical difference in how these databases work, create sample tables and finally integrate our spark application to load the UK MOT vehicle testing dataset into them. Once loaded, anyone can at any time, perform analytical queries on the tables.

Similar Projects

Explore hive usage efficiently in this hadoop hive project using various file formats such as JSON, CSV, ORC, AVRO and compare their relative performances

Hive Project -Learn to write a Hive program to find the first unique URL, given 'n' number of URL's.

In this NoSQL project, we will use two NoSQL databases(HBase and MongoDB) to store Yelp business attributes and learn how to retrieve this data for processing or query.

Curriculum For This Mini Project