Support Questions

TheBhaskarDas · ‎09-11-2017

If there are m mappers and n reducers, then what is the no of distinct copy functions?

PLEASE ELABORATE BEHIND THE ANSWER!

Which one is the answer? The Options Are given below:

m^n
m*n
m+n
n
m

mqureshi · ‎09-11-2017

So you want to know when mappers have completed and data is being transferred to reducers, how many times copy occurs? Right?

After mappers complete, data is sent to reducer based on keys. Data for each key will land on a particular reducer and only that reducer, no matter which mapper it is coming from. One reducer may have more than one key, but one key will always exist on a particular reducer. So imagine, mappers output data on node 1, node 2, and node 3. Further assume that there is a key "a" for which data is present in mapper outputs on node 1, node 2, and node 3. Imagine reducers running on each of the three nodes (total three reducers). suppose data for key "a" is going to node 3. Then data from node 1, node 2 will be copied to node 3 as reducer input. In fact data from node 3 will also be copied over in a folder where reducer can pick it up (local copy unlike over the network for data coming from node 1 and node 2). So really three copies occurred when you had 3 mappers and 1 reducer.

If you follow the above logic on how copy is done based on keys, you will arrive at "m*n" copies. Please see the picture in following link (Map Reduce data flow). that should visually answer what I have described above. Hope this helps.

https://developer.yahoo.com/hadoop/tutorial/module4.html#dataflow

TheBhaskarDas · ‎09-11-2017

I got your nice explanation, Thanks! but will you please tell me what is distinct copy function's exact meaning of the sense over here?

Cloudera Community

Support Questions

If there are m mappers and n reducers, then what is the no. of distinct copy functions?