Support Questions

clouderazone · ‎02-04-2016

What is the difference between Partitioner, Combiner, Shuffle and sort phase in Map Reduce. What is the order of execution of these phases. My understanding of the process flow is as follows:

1) Each Map Task output is Partitioned and sorted in memory and Combiner functions runs on it. This output is written to local disk called as Intermediate Data.

2) All the intermediate data from all the DataNodes go through a phase called Shuffle and sort and which is taken care by Hadoop Framework.

3) Sorted output is given as input to Reducers.

Please verify if the process flow is correct and provide your valuable inputs.

bleonhardi · ‎02-04-2016

https://developer.yahoo.com/hadoop/tutorial/module4.html

Map -> Combiner -> Partitioner -> Sort -> Shuffle -> Sort -> Reduce

https://farm3.static.flickr.com/2374/3529959828_0b689d1d5c_o.png

https://farm3.static.flickr.com/2275/3529146683_c8247ff6db_o.png

View solution in original post

bleonhardi · ‎02-07-2016

May I ask why you care? Any specific curiosity or performance problem or just curiosity?

satyap · ‎03-28-2017

@ Benjamin Leonhardi Why sorting is written before shuffling? I think sorting always happen after the shuffling. As there is already combiner to combine(sort) the output on single node. I think when all intermediated data collected using shuffling then sorting is use to make one single input file, which will use by reducer.