Support Questions

Find answers, ask questions, and share your expertise

ResourceManager cannot start

avatar
Rising Star

After building out a HDP 2.2 cluster (single node) using blueprint I'm getting the following error around the ResourceManager.

$ less /var/log/hadoop-yarn/yarn/yarn-yarn-resourcemanager-gsc01-ost-tesla-h-hb01.td.local.log
STARTUP_MSG: Starting ResourceManager
STARTUP_MSG:   host = gsc01-ost-tesla-h-hb01.td.local/192.168.106.26
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.6.0.2.2.9.0-3393
...
2015-12-15 01:01:47,671 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service RMActiveServices failed in state INITED; cause: java.lang.IllegalArgumentException: Illegal capacity of -1.0 for node-label=default in queue=root, valid capacity should in range of [0, 100].
java.lang.IllegalArgumentException: Illegal capacity of -1.0 for node-label=default in queue=root, valid capacity should in range of [0, 100].
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.internalGetLabeledQueueCapacity(CapacitySchedulerConfiguration.java:465)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getLabeledQueueCapacity(CapacitySchedulerConfiguration.java:477)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadCapacitiesByLabelsFromConf(CSQueueUtils.java:143)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadUpdateAndCheckCapacities(CSQueueUtils.java:122)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupConfigurableCapacities(AbstractCSQueue.java:99)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:242)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setupQueueConfigs(ParentQueue.java:109)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.<init>(ParentQueue.java:100)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:589)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:465)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:297)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:326)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:576)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1016)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:269)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1237)
2015-12-15 01:01:47,672 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(210)) - Stopping ResourceManager metrics system...

My blueprint file is intentionally sparse so I'm only calling out components without setting any configurations unless needed.

{
  "host_groups" : [
    {
      "name" : "host_group_1",
      "configurations" : [ ],
      "components" : [
        { "name" : "ZOOKEEPER_SERVER" },
        { "name" : "ZOOKEEPER_CLIENT" },

...

  ],
  "Blueprints" : {
    "stack_name" : "HDP",
    "stack_version" : "2.2"
  }

I suspect this message a bit up in the logs might be related:

2015-12-15 01:01:47,598 INFO  conf.Configuration (Configuration.java:getConfResourceAsInputStream(2236)) - found resource capacity-scheduler.xml at file:/etc/hadoop/conf.empty/capacity-scheduler.xml
2015-12-15 01:01:47,663 WARN  capacity.CapacitySchedulerConfiguration (CapacitySchedulerConfiguration.java:getAccessibleNodeLabels(433)) - Accessible node labels for root queue will be ignored, it will be automatically set to "*".
2015-12-15 01:01:47,668 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler failed in state INITED; cause: java.lang.IllegalArgumentException: Illegal capacity of -1.0 for node-label=default in queue=root, valid capacity should in range of [0, 100].

Looking in the mentioned .xml file:

    <property>
      <name>yarn.scheduler.capacity.root.accessible-node-labels.default.capacity</name>
      <value>-1</value>
    </property>
    <property>
      <name>yarn.scheduler.capacity.root.accessible-node-labels.default.maximum-capacity</name>
      <value>-1</value>
    </property>

Do I just need to set these in my blueprint file?

NOTE: Here's the full .xml file: capacity-schedulerxml.txt

EDIT #1

I took these 2 properties out of the above .xml file and attempted to restart ResourceManager, but it's still throwing the same exception:

2015-12-15 10:40:51,231 FATAL resourcemanager.ResourceManager (ResourceManager.java:main(1241)) - Error starting ResourceManager
java.lang.IllegalArgumentException: Illegal capacity of -1.0 for node-label=default in queue=root, valid capacity should in range of [0, 100].
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.internalGetLabeledQueueCapacity(CapacitySchedulerConfiguration.java:465)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getLabeledQueueCapacity(CapacitySchedulerConfiguration.java:477)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadCapacitiesByLabelsFromConf(CSQueueUtils.java:143)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadUpdateAndCheckCapacities(CSQueueUtils.java:122)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupConfigurableCapacities(AbstractCSQueue.java:99)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:242)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setupQueueConfigs(ParentQueue.java:109)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.<init>(ParentQueue.java:100)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:589)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:465)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:297)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:326)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:576)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1016)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:269)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1237)
2015-12-15 10:40:51,233 INFO  resourcemanager.ResourceManager (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
1 ACCEPTED SOLUTION

avatar

This sounds like a bug, and both values are removed in newer Ambari versions.

https://issues.apache.org/jira/browse/AMBARI-13232

Could you remove the following two values and try to restart the RM:

  • yarn.scheduler.capacity.root.accessible-node-labels.default.capacity
  • yarn.scheduler.capacity.root.accessible-node-labels.default.maximum-capacity

View solution in original post

16 REPLIES 16

avatar
Master Guru

@Sam Mingolelli - it looks like there is configuration error in capacity scheduler, could you please attach your capacity-scheduler.xml here ? ( default location - /etc/hadoop/conf/capacity-scheduler.xml)

Default working configuration attached. - default-capacity-scheduler.txt

avatar
Rising Star

Thanks, I found this file as well, but the values there I'm not explicitly setting in my blueprint. It has a value of -1 which is outside the valid range 0,100, seems like a bug.

avatar

I have seen this before, but on a HDP 2.3 system. Could you paste the content of your capacity-scheduler.xml (In Ambari within the Yarn config)

avatar
Rising Star

@Jonas Straub - do I just need to provide these 2 params in my blueprint? Seems like a potential bug to me.

avatar

This sounds like a bug, and both values are removed in newer Ambari versions.

https://issues.apache.org/jira/browse/AMBARI-13232

Could you remove the following two values and try to restart the RM:

  • yarn.scheduler.capacity.root.accessible-node-labels.default.capacity
  • yarn.scheduler.capacity.root.accessible-node-labels.default.maximum-capacity

avatar
Rising Star

Took those out of the capacity-scheduler.xml file and attempted to restart, still fails, I've added the messages to the Q.

avatar

can you check to local file (etc/yarn/conf/capacity-scheduler.xml) and make sure the values are not set in there.

make sure there is no RM still running (ps aux | grep resourcemanager) and there is no RM pid-file in /var/run/hadoop-yarn/yarn

avatar
Rising Star

I had to remove the options through the config tab of Ambari but once I did that I was able to restart and resourcemanager is now running.

avatar

Awesome! I am glad its running now 🙂