I recently tried to modify the Heap size for m/r tasks by setting property "MapReduce Child Java Maximum Heap Size (Client Override)" in menu "Tasktracker" => "Resource Management". I set the value to 222MiB.
After any config. change I re-deploy the config and restart the services.
If I submit e.g. the "pi" job from the hadoop-mapreduce-examples.jar and dive into the job's xml configuration I can see that "mapred.child.java.opts" is set to "-Xmx145171557".
This is the part I don't understand, since
Why is the HeapSize value of the Gateway role applied to the configuration of a submitted job, rather than the value set explicitly under chapter "Tasktracker" ?!?!
Didn't I see the forest but the trees ?!?!
Any help highly appreciated, regards...Gerd...
Info: I tested this behaviour on a 10-node CDH4.5 cluster as well as in a CDH4 quickstart VM, thereby I assume it is a base mis-understanding somewhere...
after some further tests it seems like I got it ;)
a) the configuration in the section "Gateway..." will be written in the mapred-site.xml of the client-configuration and thereby deployed via "Deploy client configuration" to the corresponding directory under /etc/hadoop/conf (via the update-alternatives)
b) if I submit the example M/R job, it uses that config. from /etc/hadoop/conf
c) the settings in section "TaskTracker..." will be written to the mapred-site.xml of the rundirectory while restarint the corresponding service and therefore the modifications in this section will never be considered for jobs being submitted
Is this correct ?!?!
so, at the end, why at all can I set "MapReduce Child Java Maximum Heap Size (Client Override)" in the "TaskTracker" section of CM=>service mapreduce1 config if any job that will be submitted is using the mapred.child.java.opts from file /etc/hadoop/conf/mapred-site.xml ?!?! And this file is generated from the settings of section "Gateway (Default" in the configuration pane of mapreduce service.
When will the Child-Heap-Size setting from the TaskTracker section be applied, and to whom ?!?! Currently I really have no clue since jobs submitted on the shell, submitted via Hive are receiving the settings from the standard config. directory /etc/hadoop/conf
thanks for answering. I assumed that since the "Client override" is in the naming of the property ;)
Did I get it right that there are 3 possibilities =>
1) if I submit a job without setting a property explicitly the "Gateway" setting are used (from /etc/hadoop/conf/xyz)
2) if I specify a property explicitly this will be used (the client override setting is empty)
3) if I specify a property and the Tasktracker...client override property is also set, the override setting will be used
thanks in advance, Gerd