Chicago Crime Data Analysis on Apache Spark

Chicago Crime Data Analysis on Apache Spark

In this project, we will look at running various use cases in the analysis of crime data sets using Apache Spark.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Nathan Elbert

Senior Data Scientist at Tiger Analytics

This was great. The use of Jupyter was great. Prior to learning Python I was a self taught SQL user with advanced skills. I hold a Bachelors in Finance and have 5 years of business experience.. I... Read More

Camille St. Omer

Artificial Intelligence Researcher, Quora 'Most Viewed Writer in 'Data Mining'

I came to the platform with no experience and now I am knowledgeable in Machine Learning with Python. No easy thing I must say, the sessions are challenging and go to the depths. I looked at graduate... Read More

What will you learn

Spark's DataFrame vs Dataset
Type-safe UDF in Spark
Rollup functions in Spark
Windowing functions in Spark
Running your spark code in Apache Zeppelin

Project Description

In this Hackerday, we will look at running various use cases in the analysis of crime datasets using Apache Spark.
This is a back-to-basics Hackerday session that is going to be very expository for those who have never written spark application or are new to writing spark application using Scala. We will explore the Spark SQL UDF and as well as roll-up and windowing functions.

We will also do a final submission of our application on Apache Zeppelin to submit our application to our friends. We will try to run some of our code in both 1.x and 2.x versions of Spark. However, you are recommended to start moving completely to Spark 2.x.

Similar Projects

In this project, we will look at Cassandra and how it is suited for especially in a hadoop environment, how to integrate it with spark, installation in our lab environment.

In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.

Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Curriculum For This Mini Project