DeZyre - live hands on training
  • Home
  • Mini Projects
  • Blog
  • Sign In
  • FREE PROJECT RECIPES

Learn Java for Hadoop Tutorial: Arrays

  • Back to tutorial home
  • About
  • Videos
  • Blogs
  • Topics
  • Request Info

Learn how you can build Big Data Projects


So, you want to learn Hadoop? That’s great, but you are confused on what Java concepts you need to learn for hadoop and from where. The series of self-guided “Learn Java for Hadoop Tutorials”  will help you learn java essentials for hadoop right from the basics of Arrays in Java  and cover all the core java basics required to become a productive Hadoop developer. This series of Java for Hadoop tutorial explains the concepts in Java with code examples which will help you grasp java knowledge for hadoop as we go along. This java essentials for hadoop tutorial assumes that you have some basic computer programming experience in C or C++.

Learn Java for Hadoop Tutorial

What will you learn from this Java for Hadoop Tutorial?

This java essentials for hadoop tutorial will cover the concepts of Arrays in Java and will examine the process of using arrays in a Hadoop MapReduce program.

Pre-requisites to follow “Java Tutorial Arrays for Hadoop”

  • Java 1.5 or above version must be installed.
  • Hadoop must be installed
  • Eclipse must be installed.

Before diving into the tutorial, let us understand if at all Java is required for Hadoop. You can go through the blog post below to read up on it.

Is Java necessary for learning Hadoop?

Hadoop Training Online

Java Concepts for Hadoop –Arrays

What are Arrays?

Arrays are the building blocks and the most efficient data structures of any object oriented programming language. People who are new to the world of object oriented programming might not know what a data structure is .Data Structure is a way to organize the data in a computer so that it can be used efficiently for storing, modifying, accessing and deleting data. Data Structures are classified into two types linear and non-linear.

Linear Data Structure –Organized in a similar way like computer memory i.e.  Data elements are attached one after the other in a linear fashion and any retrieval and manipulation follows the same order.

Examples of Linear Data Structure

Arrays, Queue, List

Non-Linear Data Structure –In a non-linear data structure one data element is connected to another data element through a relationship and all elements in a non-linear data structure cannot be traversed in a single run.

Examples of Non-Linear Data Structure

Graph, Trees, etc.

Arrays are linear data structures that store fixed size elements of same data type in a sequential manner. The length of the array is defined when the array is created and once created the length of an array cannot be changed through the entire program life cycle. Every item in an array is referred to as an element which can be retrieved using its index that begin from 0. For example, to access the 5th element of an array in java, index 4 would be used since the numerical index begins from 0.

Activate free course on Java for learning Hadoop!

Array Java Tutorial : Learn how to work with Arrays in Java to learn Hadoop

Creating and Initializing an Array

There are multiple ways to declare and initialize an array-

1) Declaring a Variable to Refer to an Array

  int [] student; //Create an Array of Integers

  student = new int [10];

2) Using the new operator in Java to create an array

int [] student = new int [10]; // this declares an array variable, student, creates an array of 10 elements of integer type and assigns its reference to student.

int student [] = new int [10];

 3) Declaring and initializing an array at the same time

 int[] student={0,0,0,0,0,0,0,0,0,0}; // This declares an array variable, student, creates an array of 10 elements of integer type and initializes all the elements of the array to 0.

If an array in java is just declared and not initialized then by default it gets initialized with 0’s for all the elements of an array.

Inserting Elements into an Array

Let’s demonstrate the insertion of elements into an array in java using a simple example. Suppose that you are a teacher who is evaluating the academic assignments for 10 students. Having evaluated the academic assignments, you need to update the marks in an array, here’s how you can do it -

int student[] = new int[10];

student[0]=55;

student[1]=45;

student[3]=76;

student[4]=67;

student[5]=89;

student[6]=98;

student[7]=65;

student[8]=34;

student[9]=56;

Accessing Elements of an Array

Elements of an array can be accessed using the array name and the index as shown below -

array_name [index_of_the_element]

In the above example, we have inserted the marks scored by each student for the academic assignment, let’s try to retrieve those marks for each student using the index -

       System.out.println("I am student 1,i got  :: "+student[0]);

       System.out.println("I am student 2,i got  :: "+student[1]);

       System.out.println("I am student 3,i got  :: "+student[2]);

       System.out.println("I am student 4,i got  :: "+student[3]);

       System.out.println("I am student 5,i got  :: "+student[4]);

       System.out.println("I am student 6,i got  :: "+student[5]);

       System.out.println("I am student 7,i got  :: "+student[6]);

       System.out.println("I am student 8,i got  :: "+student[7]);

       System.out.println("I am student 9,i got  :: "+student[8]);

       System.out.println("I am student 10,i got  :: "+student[9]);



Updating the Elements of an Array

Having inserted a value for an element in an array it can be easily manipulated using the below syntax -

array_name [index_of_the_element] =new_value

Let’s suppose that the student seems to be disappointed with his academic assignment marks and requests for a revaluation of his academic assignment. On revaluation, the professor notices that there were few discrepancies in the first valuation of the academic assignment and the actual marks scored by the student are 60 and not 34. The marks for student 8 can be updated as follows -

//Teacher Updates the marks of the academic assignment
student[7]=60;

//Student checks the updated marks of the academic assignment.

System.out.println("I am student 8,i got  :: "+student[7]);

Output after updating the marks in the array is as follows -

I am student 8, I got:: 60



​

​
​

Deleting an Array

Arrays can be deleting by assigning the reference variable to a NULL value. Whenever JVM calls the garbage collector and finds that there is no reference to an array, the space occupied by the array is deleted. When you try to access the elements of an array that has been deleted, you encounter a NULL pointer exception.

student=null;

System.out.println("I am student 1,i got  :: "+student[0]);

On trying to access an element from the array that has referenced as NULL, the output will be an exception as shown below -

Exception in thread "main" java.lang.NullPointerException

           at mapreducewitharray.Arrays.App.main (App.java:71)



Understanding the Usage of Java Arrays Concept in a Hadoop MapReduce Program

Having understood the various operations with arrays in Java, it is necessary to know how the knowledge of arrays concept from Java will be helpful in learning hadoop. Before, we dive into the details of a Hadoop MapReduce program, we suggest that you understand the Hadoop Ecosystem as whole and what actually Hadoop MapReduce through these free resources for learning hadoop–

Hadoop Ecosystem and Its Components

What is Hadoop MapReduce?

MapReduce is the core component of the distributed processing framework Hadoop which is written in Java –

Map Phase – Data transformation and pre-processing step. Data is input in terms of key value pairs and after processing is sent to the reduce phase.

Reduce Phase- Data is aggregated and the business logic is implemented in this phase which is sent to the next big data tool in the data pipeline for further processing.

The standard Hadoop’s MapReduce model has Mappers, Reducers, Combiners, Partitioner, and sorting all of which manipulate the structure of the data to fit the business requirements. It is evident that to manipulate the structure of the data – Map and Reduce phase need to make use of data structures like arrays to perform various transformation operations.

Learn Hadoop by working on interesting Big Data and Hadoop Projects for just $9

Hadoop Java Example Code

Let’s look at a simple Hadoop MapReduce program that demonstrates the usage of java arrays concept for data manipulation in Hadoop -

We have a dataset with two columns –

  • Day of the Week
  • Average Temperature recorded each day

The dataset used in this java Hadoop MapReduce example for usage of arrays is a comma separated .csv file that looks like this -

Thursday,28

Sunday,8

Friday,42

Friday,6

Saturday,8

Thursday,12

Monday,12

Sunday,40

Saturday,50

Sunday,10

Saturday,1

This java hadoop MapReduce example aims to find out the top 5 coldest temperature for each weekday.

To solve this use case we have to find 5 values of temperatures for each day which are the coldest i.e. we have to find five minimum temperature values for each week day.

Below is the Hadoop MapReduce code to solve the problem statement -

package mapreducewitharray.Arrays;

import java.io.IOException;

import java.util.ArrayList;

import java.util.Collections;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MinTemperature {

public static class TokenizerMapper

      extends Mapper{

   public void map(Object key, Text value, Context context

                   ) throws IOException, InterruptedException {

     String[] day_temp=value.toString().split(",");

     context.write(new Text(day_temp[0]), new IntWritable(Integer.parseInt(day_temp[1])));

   }

 }

 public static class IntSumReducer extends Reducer {

public void reduce(Text key, Iterable values,Context context) throws IOException, InterruptedException {

          ArrayList sorted=new ArrayList();

    for (IntWritable val : values) {

         sorted.add(val.get());

     }

     Collections.sort(sorted);

          int[] top5_cold_days=new int[5];

                 for (int i=0;i

                 {

                         top5_cold_days[i]=sorted.get(i);

                 }        

               String temp="";

               for (int i=0;i

                {

                        temp=temp+top5_cold_days[i]+"\t";

                }

                context.write(key, new Text(temp));

}

   }

  public static void main(String[] args) throws Exception {

   Configuration conf = new Configuration();

   Job job = Job.getInstance(conf, "word count");

   job.setJarByClass(MinTemperature.class);

   job.setMapperClass(TokenizerMapper.class);

   job.setReducerClass(IntSumReducer.class);

   job.setMapOutputKeyClass(Text.class);

   job.setMapOutputValueClass(IntWritable.class);

   job.setOutputKeyClass(Text.class);

   job.setOutputValueClass(Text.class);

   FileInputFormat.addInputPath(job, new Path("/home/dezyre/Downloads/temperature.csv"));

   FileOutputFormat.setOutputPath(job, new Path("/home/dezyre/Downloads/temperature.out"));

   System.exit(job.waitForCompletion(true) ? 0 : 1);

 }

}

 

In the above Java Hadoop MapReduce example we have used arrays concept from Java in the Map and Reduce phase –

1) Java Arrays usage in the Map Phase

The map phase declares an array String [] day_temp that stores the columns of the csv file. Index at position 0 will hold the day name and the first index position will have the corresponding temperature associated with that day. One important point to note in the above Hadoop MapReduce example is that the temperature is stored in a string array and not an integer array.

2) Java Arrays Usage in the Reduce Phase

In the reduce phase the array int[] top5_cold_days is used to store the result of processing .This array holds the Top 5 cold temperatures for each weekday from the given dataset.

How to run the Java Hadoop MapReduce Example?

Create a jar with the help of eclipse or a command line and execute it as shown below:

Java Hadoop MapReduce Example Execution

Java for Hadoop MapReduce Example

The program creates an output in the folder: /home/DeZyre/Downloads/temperature.out

On executing the Java Hadoop MapReduce example to find the top 5 coldest temperatures for each weekday, temperature.out folder is created and has the following contents as shown below -

Java for Hadoop MapReduce

 

The output is stored in the file “part-r-00000” and is as follows -

Five coldest temperatures for each day of the week:

Friday              2          4          4          6          8         

Monday           2          2          2          4          4         

Saturday          4          6          6          8          8         

Sunday            2          2          6          8          8         

Thursday         4          6          6          6          6         

Tuesday           4          4          10        10        10       

Wednesday     2          10        10        10        12       

Do you want to play with the dataset and source code of the example demonstrated in the first series of "Java MapReduce Tutorial"? Send an email to manisha@dezyre.com to get the "Hadoop Java Tutorial PDF"  delivered to your inbox along with the .csv file and the complete project source code.

Follow us on Twitter, Facebook and LinkedIn to receive regular updates on the next series of "Java for Hadoop" tutorials.

PREVIOUS

NEXT

Big Data and Hadoop Certification

  • Promotional Price
  • Microsoft Track
    Microsoft Professional Hadoop Certification Program
  • Hackerday

Online courses

  • Hadoop Training
  • Spark Training
  • Data Science in Python
  • Data Science in R
  • Data Science Training
  • Hadoop Training in California
  • Hadoop Training in New York
  • Hadoop Training in Texas
  • Hadoop Training in Virginia
  • Hadoop Training in Washington
  • Hadoop Training in New Jersey
  • Hadoop Training in Dallas
  • Hadoop Training in Atlanta
  • Hadoop Training in Chicago
  • Hadoop Training in Canada
  • Hadoop Training in Charlotte
  • Hadoop Training in Abudhabi
  • Hadoop Training in Dubai
  • Hadoop Training in Detroit
  • Hadoop Training in Edison
  • Hadoop Training in Germany
  • Hadoop Training in Fremont
  • Hadoop Training in Houston
  • Hadoop Training in Sanjose

Learn Java for Hadoop Tutorial: Arrays Blog

  • Data Cleaning in Python
  • Python Pandas Dataframe Tutorials
  • Recap of Hadoop News for September 2018
  • Introduction to TensorFlow for Deep Learning
  • Recap of Hadoop News for August 2018
  • AWS vs Azure-Who is the big winner in the cloud war?

Other Tutorials

Hadoop Online Tutorial – Hadoop HDFS Commands Guide

MapReduce Tutorial–Learn to implement Hadoop WordCount Example

Hadoop Hive Tutorial-Usage of Hive Commands in HQL

Hive Tutorial-Getting Started with Hive Installation on Ubuntu

Learn Java for Hadoop Tutorial: Inheritance and Interfaces

Learn Java for Hadoop Tutorial: Classes and Objects

Tutorial- Hadoop Multinode Cluster Setup on Ubuntu

Apache Pig Tutorial: User Defined Function Example

Apache Pig Tutorial Example: Web Log Server Analytics

Impala Case Study: Web Traffic

Impala Case Study: Flight Data Analysis

Hadoop Impala Tutorial

Apache Hive Tutorial: Tables

Flume Hadoop Tutorial: Twitter Data Extraction

Flume Hadoop Tutorial: Website Log Aggregation

Hadoop Sqoop Tutorial: Example Data Export

Hadoop Sqoop Tutorial: Example of Data Aggregation

Apache Zookepeer Tutorial: Example of Watch Notification

Apache Zookepeer Tutorial: Centralized Configuration Management

Hadoop Zookeeper Tutorial

Hadoop Sqoop Tutorial

Hadoop PIG Tutorial

Hadoop Oozie Tutorial

Hadoop NoSQL Database Tutorial

Hadoop Hive Tutorial

Hadoop HDFS Tutorial

Hadoop hBase Tutorial

Hadoop Flume Tutorial

Hadoop 2.0 YARN Tutorial

Hadoop MapReduce Tutorial

Big Data Hadoop Tutorial for Beginners- Hadoop Installation

Big Data and Hadoop Training Courses in Popular Cities

  • Microsoft Big Data and Hadoop Certification
  • Hadoop Training in Texas
  • Hadoop Training in California
  • Hadoop Training in Dallas
  • Hadoop Training in Chicago
  • Hadoop Training in Charlotte
  • Hadoop Training in Dubai
  • Hadoop Training in Edison
  • Hadoop Training in Fremont
  • Hadoop Training in San Jose
  • Hadoop Training in New Jersey
  • Hadoop Training in New York
  • Hadoop Training in Atlanta
  • Hadoop Training in Canada
  • Hadoop Training in Abu Dhabi
  • Hadoop Training in Detroit
  • Hadoop Trainging in Germany
  • Hadoop Training in Houston
  • Hadoop Training in Virginia
  • Hadoop Training in Washington
  • Contact Us
  • Mini Projects
  • Free Recipes
  • Blog
  • Tutorials
  • Privacy Policy
  • Disclaimer
Copyright 2019 Iconiq Inc. All rights reserved. All trademarks are property of their respective owners.