Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What algorithm is used internally by unprepared sort and shuffle phase.?

What algorithm is used internally by unprepared sort and shuffle phase.?

New Contributor

What algorithm is used internally by unprepared sort and shuffle phase.?

 

 

1 REPLY 1

Re: What algorithm is used internally by unprepared sort and shuffle phase.?

Master Guru
I don't know what you mean by "unprepared", but the Map sort phase uses Quick Sort [1], while the Reduce's merge phase uses Merge Sort [2].

The purer implementations of Apache Hadoop's Quick Sort and Merge Sort algorithms can be found at https://github.com/cloudera/hadoop-common/blob/cdh4.5.0-release/hadoop-common-project/hadoop-common/... and https://github.com/cloudera/hadoop-common/blob/cdh4.5.0-release/hadoop-common-project/hadoop-common/...

[1] - Sources indicating a Map Task's use of QuickSort for sorting the output buffers: https://github.com/cloudera/hadoop-common/blob/cdh4.5.0-release/hadoop-mapreduce1-project/src/mapred...
[2] - Sources indicating a Reduce Task's use of Merge Sort for merging and sorting the chunks of individual partition data obtained from all maps: https://github.com/cloudera/hadoop-common/blob/cdh4.5.0-release/hadoop-mapreduce1-project/src/mapred...