Support Questions

heta_desai · ‎03-29-2017

i need detail information on in which memory map and reduce task will be performed ? the reduce will bring all the map task's output into one node and than performs reduce and give final output ?

jsensharma · ‎03-29-2017

@heta desai The following doc will give mode detail on this:

1. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_command-line-installation/content/determ...

2. https://hortonworks.com/apache/mapreduce/#section_1

3. https://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/

You can specify the minimum unit of RAM to allocate for a Container. The tasks are run within containers launched by YARN. mapreduce.{map|reduce}.memory.mb is used by YARN to set the memory size of the container being used to run the map or reduce task. If the task grows beyond this limit, YARN will kill the container.

.

View solution in original post

jsensharma · ‎03-29-2017

@heta desai The following doc will give mode detail on this:

1. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_command-line-installation/content/determ...

2. https://hortonworks.com/apache/mapreduce/#section_1

3. https://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/

You can specify the minimum unit of RAM to allocate for a Container. The tasks are run within containers launched by YARN. mapreduce.{map|reduce}.memory.mb is used by YARN to set the memory size of the container being used to run the map or reduce task. If the task grows beyond this limit, YARN will kill the container.

.

shivkumar82015 · ‎03-29-2017

Its takes memory from datanode and same Node Manager, where map split is stored(Due to data locality) and map output is stored in an in-memory buffer.

when this buffer is almost full then we start (in parallel) the spilling phase in order to remove data from it and reducer output will be stored on the local filesystem.

heta_desai · ‎03-30-2017

if data is distributed over 3 nodes. the final result set will be merge of this 3 node data. so my confusion is where this merge operation will be perform ?

Cloudera Community

Support Questions

in which memory Map and Reduce tasks is performed ?

Hive on Tez Performance Tuning - Determining Reduc...

Understanding Spark through Map Reduce

Hive increase map join local task memory

Map Join Memory Sizing For LLAP

Spark Memory Management

Tips and best practices for optimizing Hive perfor...

how to set number of map and reduce tasks

Map and Reduce Error: Java heap space

Map Reduce job on YARN hangs in ACCEPTED state

Reducing Cloud Spend: Cost Strategies for Cloudera...