Run time exception for Hadoop example code

[cloudera@localhost data]$ cd /home/cloudera/Desktop/dezyre
[cloudera@localhost dezyre]$ ls
output Sample1 Sample1.jar
[cloudera@localhost dezyre]$ hadoop jar Sample1.jar WordCount nasdaq/input/rain.txt nasdaq/out1
15/02/04 19:54:21 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/02/04 19:54:22 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost.localdomain:8020/user/cloudera/.staging/job_201502040729_0024
15/02/04 19:54:22 ERROR security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost.localdomain:8020/user/cloudera/nasdaq/input/rain.txt already exists
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost.localdomain:8020/user/cloudera/nasdaq/input/rain.txt already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(
at org.apache.hadoop.mapred.JobClient$
at org.apache.hadoop.mapred.JobClient$
at Method)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(
at org.apache.hadoop.mapred.JobClient.submitJob(
at org.apache.hadoop.mapred.JobClient.runJob(
at com.assignments.NasdaqAssignment1.main(
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at org.apache.hadoop.util.RunJar.main(
[cloudera@localhost dezyre]$

I am not sure about the root password but looks like a permission issue. this is the setup from Cloudera with no customization or this is something else.

2 Answer(s)


You are getting above exception because your output directory (/user/cloudera/nasdaq/input) is already created/existing in the HDFS file system

Just remember while running map reduce job do mention the output directory which is already their in HDFS. Please refer to the following instruction which would help you to resolve this exception

To run a map reduce job you have to write a command similar to below command

$hadoop jar {name_of_the_jar_file.jar} {package_name_of_jar} {hdfs_file_path_on_which_you_want_to_perform_map_reduce} {output_directory_path}

Example:- hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/faceboo-word-count.txt /home/facebook/crawler-output

Just pay attention on the {output_directory_path} i.e. /home/facebook/crawler-output . If you have already created this directory structure in your HDFS than Hadoop EcoSystem will throw the exception "org.apache.hadoop.mapred.FileAlreadyExistsException".

Solution:- Always specify the output directory name at run time(i.e Hadoop will create the directory automatically for you. You need not to worry about the output directory creation). As mentioned in the above example the same command can be run in following manner -

"hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/faceboo-word-count.txt /home/facebook/crawler-output-1"

So output directory {crawler-output-1} will be created at runtime by Hadoop eco system.

For more details you can refer to : -


refere this 

<a href="">top 10 map reduce program source code </a>

<a href="">top 10 Read Write fs program using java api </a>

<a href="">top 30 hadoop shell commands </a>