Created 12-26-2018 11:20 AM
How to sort intermediate output based on values In MapReduce?
Created 12-27-2018 06:32 AM
The MapReduce sort the intermediate data(between mapper and reducer phase) by key by default. If we want the data should be sort based on value, then we need secondary sorting.
For more Information you can reference below links:
https://www.oreilly.com/library/view/data-algorithms/9781491906170/ch01.html
Please accept the answer you found most useful.
Created 12-27-2018 12:16 PM
Sorting is carried out at the Map side. When all the map outputs have been copied, the reduce task moves into the sort phase i.e.maerging phase. which merges the map outputs, maintaining their sort ordering. This is done in rounds. For example, if there were 60 map outputs and the merge factor was 15 (the default, controlled by the mapreduce.task.io.sort.factor property, just like in the map’s merge), there would be four rounds. Each round would merge 15 files into 1, so at the end, there would be 4 intermediate files to be processed. This is done using a key-value pair.