Assignment 2


4 Answer(s)


Hello All,
I think I might have found the answer but I am still at work and will try it out today hopefully and share the results.
http://stackoverflow.com/questions/4913212/org-apache-hadoop-mapred-filealreadyexistsexception
Basically the new class for Job Configuration uses different args[] elements for input and output paths. Please read the answer from the link I provided above.
"I faced the same problem. Took me a while to figure out whats going on. The main problem was you could not attach a debugger to find out what values being passed.

you are using the args[0] as input and args[1] as output folder in your code.

Now, if you are using the new framework where you are consuming the command lines inside the run method of Tool class, args[0] is the name of the program being executed which is WordCount in this case.

args[1] is the name of the input folder you are specifying which is mapped into the output folder by the program and hence you are seeing the exception.

So the solution is:

use args[1] and args[2]."

Thank you

Viquar Syed

Hi Viquar,
Thanks for your update.
Could you please tell me which part of assignment-2 is giving the error.
If you could share the snapshot of the command you used and error you get that would be great help to understand the issue.
Thanks.

Hi Please view the command and the error message below
------------------------------------------------------------------------------
[cloudera@localhost lib]$ hadoop jar com.assignments.NasdaqAssignment1.jar com.assignments2.NasdaqAssignment1 dezyre2/input/FL_insurance_sample.csv dezyre2/output2/
16/02/10 20:39:45 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
16/02/10 20:39:45 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost.localdomain:8020/user/cloudera/.staging/job_201602101917_0011
16/02/10 20:39:45 ERROR security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost.localdomain:8020/user/cloudera/dezyre2/input/FL_insurance_sample.csv already exists
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost.localdomain:8020/user/cloudera/dezyre2/input/FL_insurance_sample.csv already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:117)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:986)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:919)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1368)
at com.assignments.NasdaqAssignment1.main(NasdaqAssignment1.java:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

Hello,

After changing the lines below as they appear the assignment ran like a charm.
//I changed args[0] to args[1], and same with the other one.
FileInputFormat.setInputPaths(conf, new Path(args[1]));
FileOutputFormat.setOutputPath(conf, new Path(args[2]));