Support Questions

sharmadukool136 · ‎12-26-2018

How to sort intermediate output based on values In MapReduce?

jagadeesan · ‎12-27-2018

@Dukool SHarma

The MapReduce sort the intermediate data(between mapper and reducer phase) by key by default. If we want the data should be sort based on value, then we need secondary sorting.

For more Information you can reference below links:

https://www.oreilly.com/library/view/data-algorithms/9781491906170/ch01.html

https://www.quora.com/What-is-secondary-sort-in-Hadoop-and-how-does-it-work/answer/Sudarshan-Sreeniv...

Please accept the answer you found most useful.

patelharshali13 · ‎12-27-2018

Sorting is carried out at the Map side. When all the map outputs have been copied, the reduce task moves into the sort phase i.e.maerging phase. which merges the map outputs, maintaining their sort ordering. This is done in rounds. For example, if there were 60 map outputs and the merge factor was 15 (the default, controlled by the mapreduce.task.io.sort.factor property, just like in the map’s merge), there would be four rounds. Each round would merge 15 files into 1, so at the end, there would be 4 intermediate files to be processed. This is done using a key-value pair.

Cloudera Community

Support Questions

In Mapreduce how to sort intermediate output based on values?