Hadoop WordCount Issue running on EC2


5 Answer(s)


Hi Kartik,

There are two important ports for hadoop.
50070: DFS state is accessible on this port.
50030: MapReduce is accessible on this port.[In your terms, resource manager]

So if you see the URL that you have used to access NN logs, you will notice 50070 as port number.
http://ec2-54-186-249-172.us-west-2.compute.amazonaws.com:50070

So if you want to access map-reduc slogs, you should use following URL:
http://ec2-54-186-249-172.us-west-2.compute.amazonaws.com:50030
This is a job-tracker UI for accessing map-reduce jobs.

1. When you will open this page, you should be able to see the jobID of your WordCount program.
Something like : JobID : job_201404042011_0001
2. Once found, click on the JobID which will open a new page with Map and Reduce tasks.
3. If you want to see the map-job details, click on Map, which will further show you all map tasks attempts made by the job.
Example : TaskId : task_201404042011_0001_m_000000
When you will click on this, it will show you link for Task-Logs[top-right corner] which when clicked, will show you all the logs related to that attempt.
Here you can see what a job is doing internally and related details.


Vote-up, if it helps.
Happy Learning @ Dezyre !!


Thanks Guys

I guess ill rephrase my questions a little bit :)

So i have submitted the wordcount job to EC2 and i went through the URL's you provided and i dont see my jobs because they are currently in pending state and wouldnt show up in the job tracker UI(AFAIK). So i ran the hadoop job -status job_201404042011_0012
and i got the following tracking url of my job, http://ec2-54-186-249-172.us-west-2.compute.amazonaws.com:50030/jobdetails.jsp?jobid=job_201404042011_0012

If i look at the URL above, you can see everything is in pending state with no errors, so i am unsure about what is going on.

I can run the same program on my local cloudera env without an issue, but most likely i would be using EC2 for my project, so wanted to make sure i can submit jobs here as well.
Any help would be appreciated.

Kartik

So looked ito the log files and realized the task tracker had some issues,

so ran the hadoop tasktracker command and i guess this fixed the issue and it started executing all teh jobs tthat were submitted.
http://ec2-54-186-249-172.us-west-2.compute.amazonaws.com:50030/jobtracker.jsp

All of my jobs were successfull and i am able to see the output.

There were some jobs that have failed , so those users need to look into the logs.

Not sure if i need to run the command(hadoop tasktracker) each time i submit the job?

Anyhow, glad its fixed.

Kartik


Hi Kartik,

Let me first explain about the job UI :
The URL that I provided to you is the URL of JobTracker UI that is running on the EC2 hadoop cluster.
All the jobs that are submitted to this JobTracker are available to this UI.

If you look at the job url, that you got after running the job -status command, you will notice that the prefix to this URL is same what we call URL to job tracker : http://ec2-54-186-249-172.us-west-2.compute.amazonaws.com:50030/jobtracker.jsp
Suffix is the jobId of your job, that you will get as soon as you will start your job : ?jobid=job_201404042011_0012


Next, the understanding, that a job don't show up on JobTracker UI until is in pending state is not correct.
As soon as we submit the job on cluster and can the see the JobId for our task on console, it starts reflecting on JobTracker UI.
Infact, the best way to track the status of job is JobTracker UI which ease the process of looking at logs and identifying the errors.
You can give a try to see the logs on your own local VM cluster by going on the browser and simply typing : http://localhost:50030/jobtracker.jsp
Here you will see all the jobs you are running on your VM in terms of JobIds.

At the end, you don't need to run the command(hadoop tasktracker) every time when you submit the job.
It seems when you are submitting your job on EC2 cluster, somehow TT on which your job is submitted is not responding in proper way and hence you are not able to see the most recent state of tasks.
Will it be possible for you to share the command that you ran (hadoop tasktracker) and the resultant output.



Vote-up, if it helps.
Happy Learning @ Dezyre !!

Hi Kartik,

Did you get a chance to run any other job on ec2 instance and faced the same issue?
If yes, please post.