Chicago Crime Data Analysis on Apache Spark

Chicago Crime Data Analysis on Apache Spark

In this project, we will look at running various use cases in the analysis of crime data sets using Apache Spark.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

What will you learn

Spark's DataFrame vs Dataset
Type-safe UDF in Spark
Rollup functions in Spark
Windowing functions in Spark
Running your spark code in Apache Zeppelin

Project Description

In this Hackerday, we will look at running various use cases in the analysis of crime datasets using Apache Spark.
This is a back-to-basics Hackerday session that is going to be very expository for those who have never written spark application or are new to writing spark application using Scala. We will explore the Spark SQL UDF and as well as roll-up and windowing functions.

We will also do a final submission of our application on Apache Zeppelin to submit our application to our friends. We will try to run some of our code in both 1.x and 2.x versions of Spark. However, you are recommended to start moving completely to Spark 2.x.

Similar Projects

PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.

In this spark project, we will measure by how much NFP has triggered moves in past markets.

Curriculum For This Mini Project