- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
wired behaviour when setting -MapReduce Child Java Maximum Heap Size-
- Labels:
-
Apache Hadoop
-
Gateway
-
MapReduce
-
Quickstart VM
Created on 06-25-2014 12:56 AM - edited 09-16-2022 02:01 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I recently tried to modify the Heap size for m/r tasks by setting property "MapReduce Child Java Maximum Heap Size (Client Override)" in menu "Tasktracker" => "Resource Management". I set the value to 222MiB.
After any config. change I re-deploy the config and restart the services.
If I submit e.g. the "pi" job from the hadoop-mapreduce-examples.jar and dive into the job's xml configuration I can see that "mapred.child.java.opts" is set to "-Xmx145171557".
This is the part I don't understand, since
- I'd expect to have a mapred.child.java.opts being set to my configured value of 222MB
- the size "145171557" matches exactly the configured value for "MapReduce Child Java Maximum Heap Size" of the "Default Gateway" Role, but there is no node serving the Gateway role
Why is the HeapSize value of the Gateway role applied to the configuration of a submitted job, rather than the value set explicitly under chapter "Tasktracker" ?!?!
Didn't I see the forest but the trees ?!?!
Any help highly appreciated, regards...Gerd...
===
Info: I tested this behaviour on a 10-node CDH4.5 cluster as well as in a CDH4 quickstart VM, thereby I assume it is a base mis-understanding somewhere...
Created 06-25-2014 10:27 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created 06-25-2014 02:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
after some further tests it seems like I got it 😉
a) the configuration in the section "Gateway..." will be written in the mapred-site.xml of the client-configuration and thereby deployed via "Deploy client configuration" to the corresponding directory under /etc/hadoop/conf (via the update-alternatives)
b) if I submit the example M/R job, it uses that config. from /etc/hadoop/conf
c) the settings in section "TaskTracker..." will be written to the mapred-site.xml of the rundirectory while restarint the corresponding service and therefore the modifications in this section will never be considered for jobs being submitted
Is this correct ?!?!
Created 06-25-2014 04:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
so, at the end, why at all can I set "MapReduce Child Java Maximum Heap Size (Client Override)" in the "TaskTracker" section of CM=>service mapreduce1 config if any job that will be submitted is using the mapred.child.java.opts from file /etc/hadoop/conf/mapred-site.xml ?!?! And this file is generated from the settings of section "Gateway (Default" in the configuration pane of mapreduce service.
When will the Child-Heap-Size setting from the TaskTracker section be applied, and to whom ?!?! Currently I really have no clue since jobs submitted on the shell, submitted via Hive are receiving the settings from the standard config. directory /etc/hadoop/conf
best, Gerd
Created 06-25-2014 09:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The client override is a setting that allows you to force MR to use a value, regardless of what any client may specify (it overrides the client values, hence Client Override). This is useful if you don't trust your end-users to specify these values appropriately and want to enforce a cluster-wide value.
Thanks,
Darren
Created 06-25-2014 10:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Darren,
thanks for answering. I assumed that since the "Client override" is in the naming of the property 😉
Did I get it right that there are 3 possibilities =>
1) if I submit a job without setting a property explicitly the "Gateway" setting are used (from /etc/hadoop/conf/xyz)
2) if I specify a property explicitly this will be used (the client override setting is empty)
3) if I specify a property and the Tasktracker...client override property is also set, the override setting will be used
?
thanks in advance, Gerd
Created 06-25-2014 10:27 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
