Support Questions

Find answers, ask questions, and share your expertise

wired behaviour when setting -MapReduce Child Java Maximum Heap Size-

avatar
Guru

Hi,

I recently tried to modify the Heap size for m/r tasks by setting property "MapReduce Child Java Maximum Heap Size (Client Override)" in menu "Tasktracker" => "Resource Management". I set the value to 222MiB.

After any config. change I re-deploy the config and restart the services.

If I submit e.g. the "pi" job from the hadoop-mapreduce-examples.jar and dive into the job's xml configuration I can see that "mapred.child.java.opts" is set  to "-Xmx145171557".

 

This is the part I don't understand, since

  • I'd expect to have a mapred.child.java.opts being set to my configured value of 222MB
  • the size "145171557" matches exactly the configured value for "MapReduce Child Java Maximum Heap Size" of the "Default Gateway" Role, but there is no node serving the Gateway role

 

Why is the HeapSize value of the Gateway role applied to the configuration of a submitted job, rather than the value set explicitly under chapter "Tasktracker" ?!?!

 

Didn't I see the forest but the trees ?!?!

Any help highly appreciated, regards...Gerd...

 

===

Info: I tested this behaviour on a 10-node CDH4.5 cluster as well as in a CDH4 quickstart VM, thereby I assume it is a base mis-understanding somewhere...

 

 

1 ACCEPTED SOLUTION

avatar
Yes, that's correct.

View solution in original post

5 REPLIES 5

avatar
Guru

Hi,

 

after some further tests it seems like I got it 😉

 

a) the configuration in the section "Gateway..." will be written in the mapred-site.xml of the client-configuration and thereby deployed via "Deploy client configuration" to the corresponding directory under /etc/hadoop/conf (via the update-alternatives)

 

b) if I submit the example M/R job, it uses that config. from /etc/hadoop/conf

 

c) the settings in section "TaskTracker..." will be written to the mapred-site.xml of the rundirectory while restarint the corresponding service and therefore the modifications in this section will never be considered for jobs being submitted

 

Is this correct ?!?!

avatar
Guru

so, at the end, why at all can I set "MapReduce Child Java Maximum Heap Size (Client Override)" in the "TaskTracker" section of CM=>service mapreduce1 config if any job that will be submitted is using the mapred.child.java.opts from file /etc/hadoop/conf/mapred-site.xml ?!?! And this file is generated from the settings of section "Gateway (Default" in the configuration pane of mapreduce service.

 

When will the Child-Heap-Size setting from the TaskTracker section be applied, and to whom ?!?! Currently I really have no clue since jobs submitted on the shell, submitted via Hive are receiving the settings from the standard config. directory /etc/hadoop/conf

 

best, Gerd

avatar
Hi Gerd,

The client override is a setting that allows you to force MR to use a value, regardless of what any client may specify (it overrides the client values, hence Client Override). This is useful if you don't trust your end-users to specify these values appropriately and want to enforce a cluster-wide value.

Thanks,
Darren

avatar
Guru

Hi Darren,

 

thanks for answering. I assumed that since the "Client override" is in the naming of the property 😉

Did I get it right that there are 3 possibilities =>  

1) if I submit a job without setting a property explicitly the "Gateway" setting are used (from /etc/hadoop/conf/xyz)

2) if I specify a property explicitly this will be used (the client override setting is empty)

3) if I specify a property and the Tasktracker...client override property is also set, the override setting will be used

?

 

thanks in advance, Gerd

avatar
Yes, that's correct.