Reply
New Contributor
Posts: 3
Registered: ‎11-08-2013

What algorithm is used internally by unprepared sort and shuffle phase.?

What algorithm is used internally by unprepared sort and shuffle phase.?

 

 

Highlighted
Posts: 1,836
Kudos: 415
Solutions: 295
Registered: ‎07-31-2013

Re: What algorithm is used internally by unprepared sort and shuffle phase.?

I don't know what you mean by "unprepared", but the Map sort phase uses Quick Sort [1], while the Reduce's merge phase uses Merge Sort [2].

The purer implementations of Apache Hadoop's Quick Sort and Merge Sort algorithms can be found at https://github.com/cloudera/hadoop-common/blob/cdh4.5.0-release/hadoop-common-project/hadoop-common/... and https://github.com/cloudera/hadoop-common/blob/cdh4.5.0-release/hadoop-common-project/hadoop-common/...

[1] - Sources indicating a Map Task's use of QuickSort for sorting the output buffers: https://github.com/cloudera/hadoop-common/blob/cdh4.5.0-release/hadoop-mapreduce1-project/src/mapred...
[2] - Sources indicating a Reduce Task's use of Merge Sort for merging and sorting the chunks of individual partition data obtained from all maps: https://github.com/cloudera/hadoop-common/blob/cdh4.5.0-release/hadoop-mapreduce1-project/src/mapred...
Announcements