Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What is the difference between Partitioner, Combiner, Shuffle and sort phase in Map Reduce. What is the order of execution

avatar
Rising Star

What is the difference between Partitioner, Combiner, Shuffle and sort phase in Map Reduce. What is the order of execution of these phases. My understanding of the process flow is as follows:

1) Each Map Task output is Partitioned and sorted in memory and Combiner functions runs on it. This output is written to local disk called as Intermediate Data.

2) All the intermediate data from all the DataNodes go through a phase called Shuffle and sort and which is taken care by Hadoop Framework.

3) Sorted output is given as input to Reducers.

Please verify if the process flow is correct and provide your valuable inputs.

1 ACCEPTED SOLUTION
11 REPLIES 11

avatar
Master Guru

May I ask why you care? Any specific curiosity or performance problem or just curiosity?

avatar
Contributor

@ Benjamin Leonhardi Why sorting is written before shuffling? I think sorting always happen after the shuffling. As there is already combiner to combine(sort) the output on single node. I think when all intermediated data collected using shuffling then sorting is use to make one single input file, which will use by reducer.