Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 102+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.
In this Databricks Azure project, you will use Spark & Parquet file formats to analyse the Yelp reviews dataset. As part of this you will deploy Azure data factory, data pipelines and visualise the analysis.
In this hive project, you will design a data warehouse for e-commerce environments.
This Elasticsearch example deploys the AWS ELK stack to analyse streaming event data. Tools used include Nifi, PySpark, Elasticsearch, Logstash and Kibana for visualisation.
Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.
The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.
The goal of this spark project for students is to explore the features of Spark SQL in practice on the latest version of Spark i.e. Spark 2.0.
In this NoSQL project, we will use two NoSQL databases(HBase and MongoDB) to store Yelp business attributes and learn how to retrieve this data for processing or query.
In this spark project, we will measure by how much NFP has triggered moves in past markets.
In this project, we will be building and querying an OLAP Cube for Flight Delays on the Hadoop platform.
In this project, we will look at running various use cases in the analysis of crime data sets using Apache Spark.
In this project, we will look at Cassandra and how it is suited for especially in a hadoop environment, how to integrate it with spark, installation in our lab environment.
In this project, we are going to talk about insurance forecast by using regression techniques.
In this project, we will evaluate and demonstrate how to handle unstructured data using Spark.
In this Hackerday, we will go through the basis of statistics and see how Spark enables us to perform statistical operations like descriptive and inferential statistics over the very large dataset.
In this project, we will look at two database platforms - MongoDB and Cassandra and look at the philosophical difference in how these databases work and perform analytical queries.
In this project, we will use complex scenarios to make Spark developers better to deal with the issues that come in the real world.
We are all living in a world of Big Data, a world where tons of GBs of data is being generated every single day. A click here, a click there, with a few algorithms running over it in the backend, and there you have the products you just browsed on an e-commerce website being displayed as an ad on your social media account’s feed. How is all that working out? If you are curious to know the answer, learning about Apache Hadoop and Apache Spark projects will do the job. These are two popular frameworks widely used to handle big data and perform data analytics over it.