Created 07-14-2016 02:36 PM
1. How Reducers know where the mapper results are stored
Created 07-14-2016 03:00 PM
on very high level,once map task get complete it notify Application master through heartbeat, AM keeps track mapping between map output and hosts. Reducer polla AM for map output locations untill it get all.
Created 07-14-2016 03:00 PM
on very high level,once map task get complete it notify Application master through heartbeat, AM keeps track mapping between map output and hosts. Reducer polla AM for map output locations untill it get all.
Created 07-14-2016 04:03 PM
I use this blog often when I forget the data movement between map-->reduce
The map outputs are copied to the reduce task JVM’s memory if they are small enough (the buffer’s size is controlled by mapred.job.shuffle.input.buffer.percent, which specifies the proportion of the heap to use for this purpose); otherwise, they are copied to disk. When the in-memory buffer reaches a threshold size (controlled by mapred.job.shuffle.merge.percent) or reaches a threshold number of map outputs(mapred.inmem.merge.threshold), it is merged and spilled to disk.