Support Questions

sharmadukool136 · ‎12-03-2018

What is the process of spilling in Hadoop’s map reduce program?

patelharshali13 · ‎12-03-2018

When the mapper starts producing the intermediate output it does not directly write the data on the local disk. Rather it writers the data in memory and some sorting of the data (Quick Sort) happens for performance reasons.

Each map task has a circular memory buffer which it writes the output to. By default, this circular buffer is of 100 MB. It can be modified by the parameter mapreduce.task.io.sort.mb.

When the contents of the buffer reach a certain threshold size (MapReduce.map.sort.spill.percent, which has the default value 0.80, or 80%), a background thread will start to spill the contents to disk. Map outputs will continue to be written to the buffer while the spill takes place, but if the buffer fills up during this time, the map will block until the spill is complete.

Cloudera Community

Support Questions

Explain process of spilling in Hadoop’s map reduce program?

Understanding Spark through Map Reduce

Running Mapreduce program using oozie Map-reduce a...

Apache Metron Explained!

Map and Reduce Error: Java heap space

Easy explaination on Map Reduce phase - From Input...

How to schedule plain java programs using oozie wi...

Map Reduce job on YARN hangs in ACCEPTED state

Metron UI - Explained

HBase zookeeper znodes explained

Reducing Cloud Spend: Cost Strategies for Cloudera...