Support Questions

Find answers, ask questions, and share your expertise

In Mapreduce how to sort intermediate output based on values?

avatar
Rising Star

How to sort intermediate output based on values In MapReduce?

2 REPLIES 2

avatar
Master Collaborator
@Dukool SHarma

The MapReduce sort the intermediate data(between mapper and reducer phase) by key by default. If we want the data should be sort based on value, then we need secondary sorting.

For more Information you can reference below links:

https://www.oreilly.com/library/view/data-algorithms/9781491906170/ch01.html

https://www.quora.com/What-is-secondary-sort-in-Hadoop-and-how-does-it-work/answer/Sudarshan-Sreeniv...

Please accept the answer you found most useful.

avatar
Rising Star

Sorting is carried out at the Map side. When all the map outputs have been copied, the reduce task moves into the sort phase i.e.maerging phase. which merges the map outputs, maintaining their sort ordering. This is done in rounds. For example, if there were 60 map outputs and the merge factor was 15 (the default, controlled by the mapreduce.task.io.sort.factor property, just like in the map’s merge), there would be four rounds. Each round would merge 15 files into 1, so at the end, there would be 4 intermediate files to be processed. This is done using a key-value pair.