Support Questions

Find answers, ask questions, and share your expertise

in which memory Map and Reduce tasks is performed ?

avatar
Expert Contributor

i need detail information on in which memory map and reduce task will be performed ? the reduce will bring all the map task's output into one node and than performs reduce and give final output ?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@heta desai The following doc will give mode detail on this:

1. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_command-line-installation/content/determ...

2. https://hortonworks.com/apache/mapreduce/#section_1

3. https://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/

You can specify the minimum unit of RAM to allocate for a Container. The tasks are run within containers launched by YARN. mapreduce.{map|reduce}.memory.mb is used by YARN to set the memory size of the container being used to run the map or reduce task. If the task grows beyond this limit, YARN will kill the container.

.

View solution in original post

3 REPLIES 3

avatar
Master Mentor

@heta desai The following doc will give mode detail on this:

1. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_command-line-installation/content/determ...

2. https://hortonworks.com/apache/mapreduce/#section_1

3. https://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/

You can specify the minimum unit of RAM to allocate for a Container. The tasks are run within containers launched by YARN. mapreduce.{map|reduce}.memory.mb is used by YARN to set the memory size of the container being used to run the map or reduce task. If the task grows beyond this limit, YARN will kill the container.

.

avatar
Expert Contributor

Its takes memory from datanode and same Node Manager, where map split is stored(Due to data locality) and map output is stored in an in-memory buffer.

when this buffer is almost full then we start (in parallel) the spilling phase in order to remove data from it and reducer output will be stored on the local filesystem.

avatar
Expert Contributor

if data is distributed over 3 nodes. the final result set will be merge of this 3 node data. so my confusion is where this merge operation will be perform ?