Still on the series on "Data engineering using Yelp dataset", we have built our data warehouse to an appreciable stage and users can make any kind of query that they want to. Well done.
But not all queries are easy to read/write by users or not all queries are easy to execute by the query engine. Some queries carry so much self-joins that they either become inefficient for the system or too confusing for the writer.
So in this hackerday, we are going to be doing network analysis using a graph database. The purpose of this is to find patterns in how a social network affects business reviews and ratings. This on its own could be an outstanding data product from the yelp dataset.
We will be using the open source graph database Neo4J and Spark to analyze the social network of users and if it has any effect on how ratings or reviews were done.