1-844-696-6465 (US)        +91 77600 44484        help@dezyre.com

MapReduce - What is Spilled Records count?



0
While running MapReduce jobs for NASDAQ, along with Map & Reduce statistics I noticed stat about "Spilled Records". What is this spilled records means?
Also noticed difference between the spilled records between a same MapReduce job run without & with Combiner step (with spilled records count for without combiner always greater than with combiner), why is it so?

2 Answer(s)


0

"Spilled Records" means the total number of records that were written to disk during a job and includes both map and reduce side spills. Spilled records can be equal to zero which is good for Memory and IO performance. If it is grater than 0 it means the memory exceeds the limit that is defined and reserved for map output buffer.. you can control this limit by setting parameters in mapred-site.xml. For better performance you should keep this spilled records small by optimizing and tuning number of tasks and/or number of cluster nodes. The more splits you have the less spills you get

0

hi Sasikumar,
The spilled record has to do with the transient data during the map and reduce operations.
Note that it's not just the map operations that generate the spilled records. When the in-memory buffer (controlled by mapred.job.shuffle.merge.percent) runs out or reaches the threshold number of map outputs (mapred.inmem.merge.threshold), it is merged and spilled to disk.

What you need to do is:
1. Write your map and reduce functions to use as little memory as possible. They should not
be using an unlimited amount of memory. For example you cand do this by avoiding to accumulate values in a map.
2. Write a combiner function and specify the minimum number of spill files needed for the
combiner to run min.num.spills.for.cobine (default 3)
3. Tune the variables in the right way. We use buffering to minimize disk writes
– io.sort.mb Size of map-side buffer to store and merge map output before spilling
to disk. (Map-side buffer)
– fs.inmemorysize.mb Size of reduce-side buffer for storing & merging multi-map
output before spilling to disk. (Reduce side-buffer)

Thanks

Your Answer

Click on this code-snippet-icon icon to add code snippet.

Upload Files (Maximum image file size - 1.5 MB, other file size - 10 MB, total size - not more than 50 MB)

Email
Password