The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.
This project will cover the understanding of Apache Spark with main focus on one of its components, Spark SQL. We will understand how Spark and Spark SQL works, its internal functioning, its capabilities and advantages over other data processing tools. We are going to take up one business problem in the area of Supply Chain. Our tech stack will be Databricks and the latest Spark 3.0 for this project. We will use Spark SQL to understand the business data and generate insights from it which must help us frame a solution for our business problem.
Build your own image similarity application using Python to search and find images of products that are similar to any given product. You will implement the K-Nearest Neighbor algorithm to find products with maximum similarity.