Percy, What is the error that you are seeing? I encountered some errors, that I fixed by making sure that the data files were copied to in HDFS, and the jar file was executed on the edge node. This command should be executed on the edge node:
hadoop jar /home/cloudera/class3/class3.jar com.eng.mr.WordCountDriver /user/cloudera/class3/wc-input /user/cloudera/class3/wc-output
The first thing is that creating thr JAR file on MacOS was different than the Windows process described. The SystemPath that I could find involved a JDK version that shows as java version "11" on my Mac. With this, the only jar file that I could find that I believe is equivalent to the "tools.jar" file in the "Running MapReduce" description (step 5) on JAVA on my Mac is jrt-fs.jar" and so the SystemPath in the POM file is the following:
The version in the POM file is the following:
I then assigned the compiler version 1.6 and exported the "class3.jar" file as instructed followed by upload to the Cloudera VM on my Mac - cloudera-quickstart-vm-4.7.0-0-vmware
When I run the program with the hadoop command described in the instructions, I get the following error messages:
[cloudera@localhost ~]$ hadoop jar /home/cloudera/class3/class3.jar com.eng.mr.WordCountDriver /user/cloudera/class3/wc-input /user/cloudera/class3/wc-outputpwd
18/10/15 13:32:25 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
18/10/15 13:32:25 INFO input.FileInputFormat: Total input paths to process : 3
18/10/15 13:32:25 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost.localdomain:8020/user/cloudera/.staging/job_201810070937_0005
18/10/15 13:32:25 ERROR security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:java.io.FileNotFoundException: Path is not a file: /user/cloudera/class3/wc-input/wordcount
The error messages continue and then the run stops
I am on Windows, I can only speculate. My suspicsion is that the JAR compiled with JDK version 11 on your Mac is not compatible with the Cloudera VM installed (CDH is now on version 6, wheras the CDH installed is version 4.4.1, whihc may have the older version of Java). Try downloading jdk-8 on your Mac, and then compile/upload with that version.
I am having problems trying to download Java version jdk-8 on the Mac and locating the systempath for the jar file.
Question: Can I use Eclipse on the cloudera VM itself to do this? I could upload the module-1-hadoop-2 src unzipped directory to a folder in the VM and then call Eclipse on the VM. I could import the existing Maven projects on the VM and save it in the folder that has the hadoop-2 src mapreduce files. However I would then have to set the Systempath based on what version of java is available on the VM.
I see that the cloudera-quickstart-vm-4.7.0-0-vmware has java version:
[cloudera@localhost ~]$ java -version
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
However how do I find the directory for the tools.jar on the VM in order to correctly set the systempath?
I was finally able to install the JDK 1.8 version and locate the tools.jar file. I exported this as class3.jar and uploaded to the VM. I ran the job as described and I get the same error messages.
Did you load the data into HDFS by using the hadoop -put command. The data (data.txt* files) files have to be loaded to HDFS file system. See this post: https://stackoverflow.com/questions/15191832/first-hadoop-project-error-input-path-does-not-exist
I had the data files in the wc-input folder. However I had also erroneously included the original wordcount directory in this folder. That was causing the hadoop run problems. I deleted the wordcount directory and the program worked.
I ran the program successfully using two versions of JAVA. The Mac comes with the Apple version of Java whcih is JDK-11. The POM file needs to be setup for this version as follows:
Line 23: <groupId>jdk.classes</groupId>
Line 24: <artifactId>jdk.classes</artifactId>
Line 25: <version>11</version>
Line 27: <systemPath>/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home/lib/jrt-fs.jar</systemPath>
The Oracle version of Java JDK1.8.0_191 requires changing the POM file lines as follows:
Line 23: <groupId>jdk.tools</groupId>
Line 24: <artifactId>jdk.tools</artifactId>
Line 25: <version>1.8.0_191</version>
Line 27 <systemPath>/Library/Java/JavaVirtualMachines/jdk1.8.0_191.jdk/Contents/Home/lib/tools.jar</systemPath>
I hope this helps any Mac user. Thanks to the DeZyre support team for your helpful suggestions.
Thanks for sharing the error. The command you are using is fine. But you need to correct the input file path. You are giving folder instead of file.
Please use the following command:
hadoop jar /home/cloudera/class3/class3.jar com.eng.mr.WordCountDriver /user/cloudera/class3/wc-input/* /user/cloudera/class3/wc-output
Hope this helps.