06-25-2014 12:56 AM
I recently tried to modify the Heap size for m/r tasks by setting property "MapReduce Child Java Maximum Heap Size (Client Override)" in menu "Tasktracker" => "Resource Management". I set the value to 222MiB.
After any config. change I re-deploy the config and restart the services.
If I submit e.g. the "pi" job from the hadoop-mapreduce-examples.jar and dive into the job's xml configuration I can see that "mapred.child.java.opts" is set to "-Xmx145171557".
This is the part I don't understand, since
Why is the HeapSize value of the Gateway role applied to the configuration of a submitted job, rather than the value set explicitly under chapter "Tasktracker" ?!?!
Didn't I see the forest but the trees ?!?!
Any help highly appreciated, regards...Gerd...
Info: I tested this behaviour on a 10-node CDH4.5 cluster as well as in a CDH4 quickstart VM, thereby I assume it is a base mis-understanding somewhere...
06-25-2014 02:28 AM
after some further tests it seems like I got it ;)
a) the configuration in the section "Gateway..." will be written in the mapred-site.xml of the client-configuration and thereby deployed via "Deploy client configuration" to the corresponding directory under /etc/hadoop/conf (via the update-alternatives)
b) if I submit the example M/R job, it uses that config. from /etc/hadoop/conf
c) the settings in section "TaskTracker..." will be written to the mapred-site.xml of the rundirectory while restarint the corresponding service and therefore the modifications in this section will never be considered for jobs being submitted
Is this correct ?!?!
06-25-2014 04:31 AM
so, at the end, why at all can I set "MapReduce Child Java Maximum Heap Size (Client Override)" in the "TaskTracker" section of CM=>service mapreduce1 config if any job that will be submitted is using the mapred.child.java.opts from file /etc/hadoop/conf/mapred-site.xml ?!?! And this file is generated from the settings of section "Gateway (Default" in the configuration pane of mapreduce service.
When will the Child-Heap-Size setting from the TaskTracker section be applied, and to whom ?!?! Currently I really have no clue since jobs submitted on the shell, submitted via Hive are receiving the settings from the standard config. directory /etc/hadoop/conf
06-25-2014 09:50 AM
06-25-2014 10:23 AM
thanks for answering. I assumed that since the "Client override" is in the naming of the property ;)
Did I get it right that there are 3 possibilities =>
1) if I submit a job without setting a property explicitly the "Gateway" setting are used (from /etc/hadoop/conf/xyz)
2) if I specify a property explicitly this will be used (the client override setting is empty)
3) if I specify a property and the Tasktracker...client override property is also set, the override setting will be used
thanks in advance, Gerd