Map and Reduce are sparate threads?


4 Answer(s)


I believe It's process

See everything that runs inside JVM runs as thread..Even your non-multithreaded programs also run in main thread.
In newer versions of Hadoop they have comeup with something called "Uber" in which you can run multiple maps on same JVM in threads.


But we see map and reduce from the single core JVM angle and from that point of view we can understand them as process.

Of course there is a main thread. Are you saying it is a trick question?

Text says;

"On a cluster (and this includes pseudodistributed mode), map and reduce tasks run in separate JVMs,"
White, Tom (2012-05-10). Hadoop: The Definitive Guide (Kindle Locations 4313-4314). O'Reilly Media. Kindle Edition.

"TaskRunner launches a new Java Virtual Machine (JVM, step 9) to run each task in (step 10), so that any bugs in the user-defined map and reduce functions don’t affect the tasktracker (by causing it to crash or hang, for example)."
White, Tom (2012-05-10). Hadoop: The Definitive Guide (Kindle Locations 4998-5000). O'Reilly Media. Kindle Edition.

I still don't see why Thread is a correct answer given the emphasis the text, and at least one of the lectures, put on the idea that separate JVMs are spawned to support the different hadoop daemons and user created MapReduce classes.

Yes looks like a trick question.
But internals says TaskTracker initialize JVMs and inside JVMs' Map/Reduce runs as thread on different JVM.

JVM is a process and mapper/reducer are list of algorithms/logic that runs in the JVM.