Hadoop Sqoop Tutorial: Example Data Export

Let us assume, we have business application, which use Netezza database for data storage. Assume, we have imported the data from Netezza tables and processed it in Hadoop in order to benefit the distributed processing. Once the data is processed in Hadoop, we need to load the data back to Netezza in order to provide the data to downstream jobs.

Hadoop Sqoop Tutorial: Example of Data Aggregation

Let us suppose, we have an online application which use “mysql” database for storing the users information and their activities. As number of visitors to the site increase, data will increase proportionally. Processing very huge data in RDBMS environments is a bottleneck. If the data is very huge, RDBMS is not feasible. That is where distributed systems help. For this, we need to bring the data to distributed systems then we need to process. The data fetching process should also be fast.

Apache Zookepeer Tutorial: Example of Watch Notification

Having effective configuration management system is important and so is to keep track of changes happening in znode. One of the way to track changes is by getting notification for every changes made to znode. A watch can be set on znode. Client can get notification upon changes in znode if client has set watch on znode. Any change to the znode triggers the watch and notifies the client. ZooKeeper's definition of a watch says that “a watch event is one-time trigger, sent to the client that set the watch, which occurs when the data for which the watch was set changes”.

Apache Zookepeer Tutorial: Centralized Configuration Management

Establishing effective configuration management is an important step for building a distributed system. It is a complex process which helps in planning, identifying, tracking and verifying changes in the software. It is important to maintain configuration integrity throughout the life cycle of the system, this can be done by having good configuration management system.

Hadoop Zookeeper Tutorial for Beginners

Zookeeper Apache is a distributed coordination service for distributed applications. It is designed to help users focus more on the functionality of the distributed application rather than worrying about the architecture. The centralized infrastructure and services provide synchronization across a Hadoop cluster.

Hadoop Sqoop Tutorial

Sqoop is a combination of SQL and Hadoop. Sqoop is a data transfer command line utility designed for efficiently importing and exporting data between RDBMS and HDFS. The data can be imported from any RDBMS like Oracle, MySQL into HDFS.

Hadoop PIG Tutorial

Apache Pig is designed to handle any kind of data. Apache Pig is a high level extensible language designed to reduce the complexities of coding MapReduce applications. Pig was developed at Yahoo to help people use Hadoop to emphasize on analysing large unstructured data sets by minimizing the time spent on writing Mapper and Reducer functions.

Hadoop Oozie Tutorial

Oozie is a server based job coordination system and workflow engine that runs in Java servlet-container. It is designed for executing workflow jobs with actions that trigger Pig jobs or MapReduce jobs. Oozie helps you string together a workflow of various coordinated jobs like Pig job, MapReduce job and a Hive Query.

Hadoop NoSQL Database Tutorial

A database that can be modelled through any other means apart from the traditional tabular relations is generally referred to as a NoSQL database. A NoSQL database organizes large distributed data sets into tuples - key value pairs and objects.

Hadoop Hive Tutorial

Apache Hive is a Hadoop run time component developed at Facebook. The data warehouse infrastructure is built on top of Hadoop stack to help users with querying, analysis and summarization. Apache Hive is a subset of SQL-92 plus Hive specific extensions.

Hadoop HDFS Tutorial

Hadoop HDFS is a java based distributed file system for storing large unstructured data sets. Hadoop HDFS is designed to provide high performance access to data across large Hadoop clusters of commodity servers. It is referred to as the “Secret Sauce” of Apache Hadoop components as the data can be stored in blocks on the file system until the organization’s wants to leverage it for big data analytics.

Hadoop hBase Tutorial

Hadoop HBase is a real time, open source, column oriented, distributed database written in Java. HBase is modelled after Google’s BigTable and represents a key value column family store. It is built on top of Apache Hadoop and Zookeeper.

Relevant Projects

You might also like

Tutorials

Top 15 Latest Recipes