Each project comes with 2-5 hours of micro-videos explaining the solution.
Code & Dataset
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
Understanding the problem statement
What are log files and different types of log file
How to process log files and importance of processing them
What are a referrer and user agent
What are the contents of a log file and uses of a log file
Why flume and how does flume work, flume agent and its role
Processing and ingestion of log data using Flume
Processing data in the map-reduce file , and using Spark for data processing
Downloading the dataset and Installing Scala on Quickstart VM ware
What is DoS attack, performing the DoS attack, performing and preventing it
Using Apache Kafka for processing complex files
What is Oozie , using it to co-ordinate tasks understanding data flow
What are Lambda Architecture and its use during Batch and Streaming Processing
Dividing data into Batch Layer, SPeedLayer and Serving layer
Implementing and Troubleshooting Flume Agent
Accumulating and Executing Hive table
Using Impala for the low-latency query of processed log data
Coordinating the data processing pipeline with Oozie
Storing, processing and mining data from web server logs has become mainstream for a lot of companies today. Industry giants have used this engineering and the accompany science of machine learning to extract information that has helped in ads targeting, improved search, application optimization and general improvement in application's user experience.
In this hadoop project, we will be using a sample application log file from an application server to demonstrated a scaled-down server log processing pipeline. From ingestion to insight usually require Hadoop-ecosystem tools like Flume, Pig, Spark, Hive/Impala, Kafka, Oozie and HDFS for storage and this is what we will be looking at but holistically and specifically at each stage of the pipeline.