DeZyre - live hands on training
  • Home
  • Mini Projects
  • Blog
  • Sign In
  • FREE PROJECT RECIPES

Tutorials

Learn how you can build Big Data Projects


Apache Pig Tutorial Example: Web Log Server Analytics

This Case study contains examples of Apache Pig commands to query and perform analysis on web server report. The log reports used in this example is generated by various web servers. The log reports contains time-stamped details of requested links, IP address, request type, server response and other data. The same data set is used for analysis in the MapReduce case studies and this case studies illustrated the simplicity of processing and analytics for Apache Pig over Hadoop MapReduce.

The analysis done in this case study reveals visits of specific user, visits per unit time, failed request. This analysis will be carried out by executing queries on the web server log report database.

Impala Case Study: Web Traffic

Storing Internet generate traffic data and processing to get useful insights could help us in understanding the customer behavior and to serve the users better to make lives better in the advanced  world. Data that has been generating over the network is increasing exponentially. But the existing data warehouse systems does not provide much scalability at less cost with higher performance. Instead of using costly warehouse systems, with the help of commodity hardware and distribution process we can serve the customers at any scale. Even if the Data generated is exponential to 10, it could be scalable simply by using Hadoop. In this case we just need to add few more nodes to increase the size of the cluster. Because, storage is cheaper than processor.

Impala Case Study: Flight Data Analysis

In this use case we are going to deal with Airport information systems data, which gives us the information regarding flight delays, reason flight get delayed, time in different formats, source and destination details including diverted routes. The data that maintained is big in size and it is increasing. Processing the data multiple times is a time taking process. Visualization tools needs to fetch the data in real time and the graphs or charts made on top of data needs to be updated quickly.

Hadoop Impala Tutorial

Impala is an open source massively parallel processing query engine on top of clustered systems like Apache Hadoop. It was created based on Google’s Dremel paper.  It is an interactive SQL like query engine that runs on top of Hadoop Distributed File System (HDFS). Impala uses HDFS as its underlying storage.

Apache Hive Tutorial: Tables

There are 2 types of tables in Hive, Internal and External. This case study describes creation of those tables, loading data, partitioning, querying and dropping table on weather data.

Flume Hadoop Tutorial: Twitter Data Extraction

In this case study, flume agent is configured to retrieve data from Twitter. We know that Twitter is a huge source of data with people's opinions and preferences. The data can be used to analyse the public opinion or review on a specific topic or a product. Various types of analysis can be done based on the tweet data and location.

Flume Hadoop Tutorial: Website Log Aggregation

This case study, focuses on a multi hop flume agent to aggregate the log reports from various web servers which have to be analysed with the help of Hadoop. Consider a scenario, where there are multiple servers located in various locations serving from different data centers. The objective is to distribute the log files based on the device type and store a backup of all logs.

Hadoop Sqoop Tutorial: Example Data Export

Let us assume, we have business application, which use Netezza database for data storage. Assume, we have imported the data from Netezza tables and processed it in Hadoop in order to benefit the distributed processing.  Once the data is processed in Hadoop, we need to load the data back to Netezza in order to provide the data to downstream jobs.

Hadoop Sqoop Tutorial: Example of Data Aggregation

Let us suppose, we have an online application which use “mysql” database for storing the users information and their activities. As number of visitors to the site increase, data will increase proportionally.  Processing very huge data in RDBMS environments is a bottleneck. If the data is very huge, RDBMS is not feasible. That is where distributed systems help.  For this, we need to bring the data to distributed systems then we need to process. The data fetching process should also be fast.

Apache Zookepeer Tutorial: Example of Watch Notification

Having effective configuration management system is important and so is to keep track of changes happening in znode. One of the way to track changes is by getting notification for every changes made to znode. A watch can be set on znode. Client can get notification upon changes in znode if client has set watch on znode. Any change to the znode triggers the watch and notifies the client. ZooKeeper's definition of a watch says that “a watch event is one-time trigger, sent to the client that set the watch, which occurs when the data for which the watch was set changes”.

1
2
3
4
5
6
7

Big Data and Hadoop Training Courses in Popular Cities

  • Hadoop Training in Texas
  • Hadoop Training in California
  • Hadoop Training in Dallas
  • Hadoop Training in Chicago
  • Hadoop Training in Charlotte
  • Hadoop Training in Dubai
  • Hadoop Training in Edison
  • Hadoop Training in Fremont
  • Hadoop Training in San Jose
  • Hadoop Training in Washington
  • Hadoop Training in New Jersey
  • Hadoop Training in New York
  • Hadoop Training in Atlanta
  • Hadoop Training in Canada
  • Hadoop Training in Abu Dhabi
  • Hadoop Training in Detroit
  • Hadoop Trainging in Germany
  • Hadoop Training in Houston
  • Hadoop Training in Virginia
  • Promotional Price
  • Microsoft Track
    Microsoft Professional Hadoop Certification Program
  • Hackerday

Online courses

  • Hadoop Training
  • Spark Certification Training
  • Data Science in Python
  • Data Science in R
  • Data Science Training
  • Contact Us
  • Mini Projects
  • Free Recipes
  • Blog
  • Tutorials
  • Privacy Policy
  • Disclaimer
Copyright 2019 Iconiq Inc. All rights reserved. All trademarks are property of their respective owners.