Support Questions

gsrao_cse · ‎07-14-2016

1. How Reducers know where the mapper results are stored

rajkumar_singh · ‎07-14-2016

on very high level,once map task get complete it notify Application master through heartbeat, AM keeps track mapping between map output and hosts. Reducer polla AM for map output locations untill it get all.

View solution in original post

rajkumar_singh · ‎07-14-2016

on very high level,once map task get complete it notify Application master through heartbeat, AM keeps track mapping between map output and hosts. Reducer polla AM for map output locations untill it get all.

sunile_manjee · ‎07-14-2016

I use this blog often when I forget the data movement between map-->reduce

The map outputs are copied to the reduce task JVM’s memory if they are small enough (the buffer’s size is controlled by mapred.job.shuffle.input.buffer.percent, which specifies the proportion of the heap to use for this purpose); otherwise, they are copied to disk. When the in-memory buffer reaches a threshold size (controlled by mapred.job.shuffle.merge.percent) or reaches a threshold number of map outputs(mapred.inmem.merge.threshold), it is merged and spilled to disk.