A while back, we did web server access log processing using spark and hive. However, that processing was batch processing and in the lambda architecture, we will only be able to operate in the batch and serving layer.
In this hackerday, we are going one step further by bringing processing to the speed layer of the lambda architecture which opens up more capabilities. One of such capability will be ability monitor application real time perform or measure real time comfort with applications or real time alert in case of security breach.
The abilities and functionalities will be explored using Spark Streaming in a streaming architecture.
Note: It is worthy of note that the Cloudera QuickStart VM does not have Kafka. However, like in our objective, we will make the case for using Kafka but our implementation will not be using Kafka. Instead, we will integrate the log agent with Spark streaming.
Stay updated in technology trends by working on projects
Live online coding sessions led by industry experts
Build 2-4 projects a month each lasting 6 hours designed to teach you advanced concepts
Code in groups and connect with your community