Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

In Mapreduce how to sort intermediate output based on values?

Highlighted

In Mapreduce how to sort intermediate output based on values?

New Contributor

How to sort intermediate output based on values In MapReduce?

2 REPLIES 2

Re: In Mapreduce how to sort intermediate output based on values?

Expert Contributor
@Dukool SHarma

The MapReduce sort the intermediate data(between mapper and reducer phase) by key by default. If we want the data should be sort based on value, then we need secondary sorting.

For more Information you can reference below links:

https://www.oreilly.com/library/view/data-algorithms/9781491906170/ch01.html

https://www.quora.com/What-is-secondary-sort-in-Hadoop-and-how-does-it-work/answer/Sudarshan-Sreeniv...

Please accept the answer you found most useful.

Re: In Mapreduce how to sort intermediate output based on values?

New Contributor

Sorting is carried out at the Map side. When all the map outputs have been copied, the reduce task moves into the sort phase i.e.maerging phase. which merges the map outputs, maintaining their sort ordering. This is done in rounds. For example, if there were 60 map outputs and the merge factor was 15 (the default, controlled by the mapreduce.task.io.sort.factor property, just like in the map’s merge), there would be four rounds. Each round would merge 15 files into 1, so at the end, there would be 4 intermediate files to be processed. This is done using a key-value pair.

Don't have an account?
Coming from Hortonworks? Activate your account here