Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How Reducers know where the mapper results are stored

avatar
Rising Star

1. How Reducers know where the mapper results are stored

1 ACCEPTED SOLUTION

avatar
Super Guru

on very high level,once map task get complete it notify Application master through heartbeat, AM keeps track mapping between map output and hosts. Reducer polla AM for map output locations untill it get all.

View solution in original post

2 REPLIES 2

avatar
Super Guru

on very high level,once map task get complete it notify Application master through heartbeat, AM keeps track mapping between map output and hosts. Reducer polla AM for map output locations untill it get all.

avatar
Master Guru

I use this blog often when I forget the data movement between map-->reduce

The map outputs are copied to the reduce task JVM’s memory if they are small enough (the buffer’s size is controlled by mapred.job.shuffle.input.buffer.percent, which specifies the proportion of the heap to use for this purpose); otherwise, they are copied to disk. When the in-memory buffer reaches a threshold size (controlled by mapred.job.shuffle.merge.percent) or reaches a threshold number of map outputs(mapred.inmem.merge.threshold), it is merged and spilled to disk.