Why do we need to upload Job to HDFS and again submit Job to Jobtracker?


3 Answer(s)


hi Sree,
Once JobTracker submits jobs to various TaskTrackers, all TTs need to have access to the codebase to execute the algorithm to execute the MAp/Reduce steps. This is why Step 5 and 6 are required, they are not redundant.

So Tasktrackers go to the Job location in HDFS to get the algorithm code?

I'm not sure about this block diagram-I remember in class Shoban flipped it quickly.
One other question. The apache API doc says "The Hadoop Map-Reduce framework spawns one map task for each InputSplit generated by the InputFormat for the job". Block digram shows file splits are created by Client. Is framework and client the same?