Spark integration and analysis with NoSQL Databases 2 - Cassandra

Spark integration and analysis with NoSQL Databases 2 - Cassandra

In this project, we will look at Cassandra and how it is suited for especially in a hadoop environment, how to integrate it with spark, installation in our lab environment.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

SUBHABRATA BISWAS

Lead Consultant, ITC Infotech

The project orientation is very much unique and it helps to understand the real time scenarios most of the industries are dealing with. And there is no limit, one can go through as many projects... Read More

Mike Vogt

Information Architect at Bank of America

I have had a very positive experience. The platform is very rich in resources, and the expert was thoroughly knowledgeable on the subject matter - real world hands-on experience. I wish I had this... Read More

What will you learn

Exploratory look at cassandra
Data modelling in Cassandra
Use cases Cassandra in the enterprise
Spark integration using our dataset
Materialized Views
Comparing Analytical queries of MongoDB and Cassandra
Spark Datasources

Project Description

In the last hackerday, we looked at NoSQL databases and their roles in today's enterprise. We talked about design choices with respect to document-oriented and wide-columnar datbases, and conclude by doing hands-on exploration of MongoDB, its integration with spark and writing analytical queries using the MongDB query structures.
Like we also noted, Spark has a benefit of being very extensible to quite a number of storage platforms beyond hadoop. This means that as spark developers, we can write and read from virtually any popular storage platform while building our data pipeline.
In this hackerday, we will conclude that session by take a look at Cassandra. We will look at what it is suited for especially in a hadoop environment, how to integrate it with spark, installation in our lab environment, modelling the UK MOT vehicle testing dataset that we used on MongoDB in the first part. Once loaded, anyone can at anytime, perform analytical queries on the tables.

Similar Projects

Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

In this Databricks Azure project, you will use Spark & Parquet file formats to analyse the Yelp reviews dataset. As part of this you will deploy Azure data factory, data pipelines and visualise the analysis.

In this Hackerday, we will go through the basis of statistics and see how Spark enables us to perform statistical operations like descriptive and inferential statistics over the very large dataset.

Curriculum For This Mini Project