Created 03-29-2017 05:05 AM
i need detail information on in which memory map and reduce task will be performed ? the reduce will bring all the map task's output into one node and than performs reduce and give final output ?
Created 03-29-2017 05:28 AM
@heta desai The following doc will give mode detail on this:
2. https://hortonworks.com/apache/mapreduce/#section_1
3. https://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/
You can specify the minimum unit of RAM to allocate for a Container. The tasks are run within containers launched by YARN. mapreduce.{map|reduce}.memory.mb is used by YARN to set the memory size of the container being used to run the map or reduce task. If the task grows beyond this limit, YARN will kill the container.
.
Created 03-29-2017 05:28 AM
@heta desai The following doc will give mode detail on this:
2. https://hortonworks.com/apache/mapreduce/#section_1
3. https://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/
You can specify the minimum unit of RAM to allocate for a Container. The tasks are run within containers launched by YARN. mapreduce.{map|reduce}.memory.mb is used by YARN to set the memory size of the container being used to run the map or reduce task. If the task grows beyond this limit, YARN will kill the container.
.
Created 03-29-2017 07:30 AM
Its takes memory from datanode and same Node Manager, where map split is stored(Due to data locality) and map output is stored in an in-memory buffer.
when this buffer is almost full then we start (in parallel) the spilling phase in order to remove data from it and reducer output will be stored on the local filesystem.
Created 03-30-2017 07:47 AM
if data is distributed over 3 nodes. the final result set will be merge of this 3 node data. so my confusion is where this merge operation will be perform ?