Yarn and Mapreduce


Can you please help me in simple language to understand, What is Yarn and MapReuce. May be with an example?



1 Answer(s)


Hi Sneh,

In Simple, yarn is framework to manage resources, and mapreduce is framework for data processing.

Ex - if any dataset comes, it will decide to assign and de-assign the resources whereas mapreduce is use for process those data with provided resources.

In details,

YARN Infrastructure (Yet Another Resource Negotiator) is the framework responsible for providing the computational resources (e.g., CPUs, memory, etc.) needed for application executions. Two important elements are:

  1. Resource Manager (one per cluster) is the master. It knows where the slaves are located (Rack Awareness) and how many resources they have. It runs several services, the most important is the Resource Scheduler which decides how to assign the resources.
  2. Node Manager (many per cluster) is the slave of the infrastructure. When it starts, it announces himself to the Resource Manager. Periodically, it sends an heartbeat to the Resource Manager. Each Node Manager offers some resources to the cluster. Its resource capacity is the amount of memory and the number of vcores. At run-time, the Resource Scheduler will decide how to use this capacity: a Container is a fraction of the NM capacity and it is used by the client for running a program.

Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data.

Reference: http://www.slideshare.net/cloudera/introduction-to-yarn-and-mapreduce-2


Hope this helps.


Your Answer

Click on this code-snippet-icon icon to add code snippet.

Upload Files (Maximum image file size - 1.5 MB, other file size - 10 MB, total size - not more than 50 MB)