Support Questions

geko · ‎06-25-2014

Hi,

I recently tried to modify the Heap size for m/r tasks by setting property "MapReduce Child Java Maximum Heap Size (Client Override)" in menu "Tasktracker" => "Resource Management". I set the value to 222MiB.

After any config. change I re-deploy the config and restart the services.

If I submit e.g. the "pi" job from the hadoop-mapreduce-examples.jar and dive into the job's xml configuration I can see that "mapred.child.java.opts" is set to "-Xmx145171557".

This is the part I don't understand, since

I'd expect to have a mapred.child.java.opts being set to my configured value of 222MB
the size "145171557" matches exactly the configured value for "MapReduce Child Java Maximum Heap Size" of the "Default Gateway" Role, but there is no node serving the Gateway role

Why is the HeapSize value of the Gateway role applied to the configuration of a submitted job, rather than the value set explicitly under chapter "Tasktracker" ?!?!

Didn't I see the forest but the trees ?!?!

Any help highly appreciated, regards...Gerd...

===

Info: I tested this behaviour on a 10-node CDH4.5 cluster as well as in a CDH4 quickstart VM, thereby I assume it is a base mis-understanding somewhere...

Darren · ‎06-25-2014

Yes, that's correct.

View solution in original post

geko · ‎06-25-2014

Hi,

after some further tests it seems like I got it 😉

a) the configuration in the section "Gateway..." will be written in the mapred-site.xml of the client-configuration and thereby deployed via "Deploy client configuration" to the corresponding directory under /etc/hadoop/conf (via the update-alternatives)

b) if I submit the example M/R job, it uses that config. from /etc/hadoop/conf

c) the settings in section "TaskTracker..." will be written to the mapred-site.xml of the rundirectory while restarint the corresponding service and therefore the modifications in this section will never be considered for jobs being submitted

Is this correct ?!?!

geko · ‎06-25-2014

so, at the end, why at all can I set "MapReduce Child Java Maximum Heap Size (Client Override)" in the "TaskTracker" section of CM=>service mapreduce1 config if any job that will be submitted is using the mapred.child.java.opts from file /etc/hadoop/conf/mapred-site.xml ?!?! And this file is generated from the settings of section "Gateway (Default" in the configuration pane of mapreduce service.

When will the Child-Heap-Size setting from the TaskTracker section be applied, and to whom ?!?! Currently I really have no clue since jobs submitted on the shell, submitted via Hive are receiving the settings from the standard config. directory /etc/hadoop/conf

best, Gerd

Darren · ‎06-25-2014

Hi Gerd,

The client override is a setting that allows you to force MR to use a value, regardless of what any client may specify (it overrides the client values, hence Client Override). This is useful if you don't trust your end-users to specify these values appropriately and want to enforce a cluster-wide value.

Thanks,
Darren

geko · ‎06-25-2014

Hi Darren,

thanks for answering. I assumed that since the "Client override" is in the naming of the property 😉

Did I get it right that there are 3 possibilities =>

1) if I submit a job without setting a property explicitly the "Gateway" setting are used (from /etc/hadoop/conf/xyz)

2) if I specify a property explicitly this will be used (the client override setting is empty)

3) if I specify a property and the Tasktracker...client override property is also set, the override setting will be used

?

thanks in advance, Gerd

Darren · ‎06-25-2014

Yes, that's correct.

Cloudera Community

Support Questions

wired behaviour when setting -MapReduce Child Java Maximum Heap Size-