Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: The Cloudera Community will undergo maintenance on Saturday, August 17 at 12:00am PDT. See more info here.

What algorithm is used internally by unprepared sort and shuffle phase.?

Highlighted

What algorithm is used internally by unprepared sort and shuffle phase.?

New Contributor

What algorithm is used internally by unprepared sort and shuffle phase.?

 

 

1 REPLY 1

Re: What algorithm is used internally by unprepared sort and shuffle phase.?

Cloudera Employee
I don't know what you mean by "unprepared", but the Map sort phase uses Quick Sort [1], while the Reduce's merge phase uses Merge Sort [2].

The purer implementations of Apache Hadoop's Quick Sort and Merge Sort algorithms can be found at https://github.com/cloudera/hadoop-common/blob/cdh4.5.0-release/hadoop-common-project/hadoop-common/... and https://github.com/cloudera/hadoop-common/blob/cdh4.5.0-release/hadoop-common-project/hadoop-common/...

[1] - Sources indicating a Map Task's use of QuickSort for sorting the output buffers: https://github.com/cloudera/hadoop-common/blob/cdh4.5.0-release/hadoop-mapreduce1-project/src/mapred...
[2] - Sources indicating a Reduce Task's use of Merge Sort for merging and sorting the chunks of individual partition data obtained from all maps: https://github.com/cloudera/hadoop-common/blob/cdh4.5.0-release/hadoop-mapreduce1-project/src/mapred...