Support Questions

Find answers, ask questions, and share your expertise

How Reducers know where the mapper results are stored

avatar
Rising Star

1. How Reducers know where the mapper results are stored

1 ACCEPTED SOLUTION

avatar
Super Guru

on very high level,once map task get complete it notify Application master through heartbeat, AM keeps track mapping between map output and hosts. Reducer polla AM for map output locations untill it get all.

View solution in original post

2 REPLIES 2

avatar
Super Guru

on very high level,once map task get complete it notify Application master through heartbeat, AM keeps track mapping between map output and hosts. Reducer polla AM for map output locations untill it get all.

avatar
Master Guru

I use this blog often when I forget the data movement between map-->reduce

The map outputs are copied to the reduce task JVM’s memory if they are small enough (the buffer’s size is controlled by mapred.job.shuffle.input.buffer.percent, which specifies the proportion of the heap to use for this purpose); otherwise, they are copied to disk. When the in-memory buffer reaches a threshold size (controlled by mapred.job.shuffle.merge.percent) or reaches a threshold number of map outputs(mapred.inmem.merge.threshold), it is merged and spilled to disk.