Created 12-15-2015 02:08 PM
After building out a HDP 2.2 cluster (single node) using blueprint I'm getting the following error around the ResourceManager.
$ less /var/log/hadoop-yarn/yarn/yarn-yarn-resourcemanager-gsc01-ost-tesla-h-hb01.td.local.log STARTUP_MSG: Starting ResourceManager STARTUP_MSG: host = gsc01-ost-tesla-h-hb01.td.local/192.168.106.26 STARTUP_MSG: args = [] STARTUP_MSG: version = 2.6.0.2.2.9.0-3393 ... 2015-12-15 01:01:47,671 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service RMActiveServices failed in state INITED; cause: java.lang.IllegalArgumentException: Illegal capacity of -1.0 for node-label=default in queue=root, valid capacity should in range of [0, 100]. java.lang.IllegalArgumentException: Illegal capacity of -1.0 for node-label=default in queue=root, valid capacity should in range of [0, 100]. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.internalGetLabeledQueueCapacity(CapacitySchedulerConfiguration.java:465) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getLabeledQueueCapacity(CapacitySchedulerConfiguration.java:477) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadCapacitiesByLabelsFromConf(CSQueueUtils.java:143) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadUpdateAndCheckCapacities(CSQueueUtils.java:122) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupConfigurableCapacities(AbstractCSQueue.java:99) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:242) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setupQueueConfigs(ParentQueue.java:109) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.<init>(ParentQueue.java:100) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:589) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:465) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:297) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:326) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:576) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1016) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:269) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1237) 2015-12-15 01:01:47,672 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(210)) - Stopping ResourceManager metrics system...
My blueprint file is intentionally sparse so I'm only calling out components without setting any configurations unless needed.
{ "host_groups" : [ { "name" : "host_group_1", "configurations" : [ ], "components" : [ { "name" : "ZOOKEEPER_SERVER" }, { "name" : "ZOOKEEPER_CLIENT" }, ... ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.2" }
I suspect this message a bit up in the logs might be related:
2015-12-15 01:01:47,598 INFO conf.Configuration (Configuration.java:getConfResourceAsInputStream(2236)) - found resource capacity-scheduler.xml at file:/etc/hadoop/conf.empty/capacity-scheduler.xml 2015-12-15 01:01:47,663 WARN capacity.CapacitySchedulerConfiguration (CapacitySchedulerConfiguration.java:getAccessibleNodeLabels(433)) - Accessible node labels for root queue will be ignored, it will be automatically set to "*". 2015-12-15 01:01:47,668 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler failed in state INITED; cause: java.lang.IllegalArgumentException: Illegal capacity of -1.0 for node-label=default in queue=root, valid capacity should in range of [0, 100].
Looking in the mentioned .xml file:
<property> <name>yarn.scheduler.capacity.root.accessible-node-labels.default.capacity</name> <value>-1</value> </property> <property> <name>yarn.scheduler.capacity.root.accessible-node-labels.default.maximum-capacity</name> <value>-1</value> </property>
Do I just need to set these in my blueprint file?
NOTE: Here's the full .xml file: capacity-schedulerxml.txt
EDIT #1
I took these 2 properties out of the above .xml file and attempted to restart ResourceManager, but it's still throwing the same exception:
2015-12-15 10:40:51,231 FATAL resourcemanager.ResourceManager (ResourceManager.java:main(1241)) - Error starting ResourceManager java.lang.IllegalArgumentException: Illegal capacity of -1.0 for node-label=default in queue=root, valid capacity should in range of [0, 100]. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.internalGetLabeledQueueCapacity(CapacitySchedulerConfiguration.java:465) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getLabeledQueueCapacity(CapacitySchedulerConfiguration.java:477) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadCapacitiesByLabelsFromConf(CSQueueUtils.java:143) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadUpdateAndCheckCapacities(CSQueueUtils.java:122) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupConfigurableCapacities(AbstractCSQueue.java:99) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:242) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setupQueueConfigs(ParentQueue.java:109) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.<init>(ParentQueue.java:100) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:589) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:465) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:297) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:326) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:576) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1016) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:269) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1237) 2015-12-15 10:40:51,233 INFO resourcemanager.ResourceManager (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
Created 12-15-2015 02:30 PM
This sounds like a bug, and both values are removed in newer Ambari versions.
https://issues.apache.org/jira/browse/AMBARI-13232
Could you remove the following two values and try to restart the RM:
Created 12-15-2015 02:18 PM
@Sam Mingolelli - it looks like there is configuration error in capacity scheduler, could you please attach your capacity-scheduler.xml here ? ( default location - /etc/hadoop/conf/capacity-scheduler.xml)
Default working configuration attached. - default-capacity-scheduler.txt
Created 12-15-2015 02:30 PM
Thanks, I found this file as well, but the values there I'm not explicitly setting in my blueprint. It has a value of -1 which is outside the valid range 0,100, seems like a bug.
Created 12-15-2015 02:19 PM
I have seen this before, but on a HDP 2.3 system. Could you paste the content of your capacity-scheduler.xml (In Ambari within the Yarn config)
Created 12-15-2015 02:25 PM
@Jonas Straub - do I just need to provide these 2 params in my blueprint? Seems like a potential bug to me.
Created 12-15-2015 02:30 PM
This sounds like a bug, and both values are removed in newer Ambari versions.
https://issues.apache.org/jira/browse/AMBARI-13232
Could you remove the following two values and try to restart the RM:
Created 12-15-2015 03:46 PM
Took those out of the capacity-scheduler.xml file and attempted to restart, still fails, I've added the messages to the Q.
Created 12-15-2015 04:00 PM
can you check to local file (etc/yarn/conf/capacity-scheduler.xml) and make sure the values are not set in there.
make sure there is no RM still running (ps aux | grep resourcemanager) and there is no RM pid-file in /var/run/hadoop-yarn/yarn
Created 12-15-2015 05:09 PM
I had to remove the options through the config tab of Ambari but once I did that I was able to restart and resourcemanager is now running.
Created 12-15-2015 05:36 PM
Awesome! I am glad its running now 🙂