Support Questions

Find answers, ask questions, and share your expertise

Map reduce flow clarification

avatar
Expert Contributor

what is the order of execution for mapreduce Job? Is it correct and please correct me if i am wrong?

Mapper
partition each mapper output
sorting with in each partition based on key
grouping
shuffle and merge:each reducer will take one partition from all map tasks and merge together
combiner
reducer
1 ACCEPTED SOLUTION

avatar
Super Collaborator
3 REPLIES 3

avatar
Super Collaborator

@vamsi valiveti

This is the right order:

10950-screen-shot-2016-12-30-at-11152-pm.png

avatar
Expert Contributor

a)could you please provide source for this link and it is really useful

b)what about these two in the diagram and where it will come?

  1. grouping
  2. shuffle and merge:each reducer will take one partition from all map tasks and merge together

avatar
Super Collaborator

@vamsi valiveti

a> This slide is from Hortonworks Training course. The course/slides are available to paid customers only.

b>

i> there is nothing like grouping

ii> Shuffle happens when data move from Map to reduce (please see the diagram) and Merge happens during sort phase at Reducer side.