Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

ResourceManager cannot start

avatar
Rising Star

After building out a HDP 2.2 cluster (single node) using blueprint I'm getting the following error around the ResourceManager.

$ less /var/log/hadoop-yarn/yarn/yarn-yarn-resourcemanager-gsc01-ost-tesla-h-hb01.td.local.log
STARTUP_MSG: Starting ResourceManager
STARTUP_MSG:   host = gsc01-ost-tesla-h-hb01.td.local/192.168.106.26
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.6.0.2.2.9.0-3393
...
2015-12-15 01:01:47,671 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service RMActiveServices failed in state INITED; cause: java.lang.IllegalArgumentException: Illegal capacity of -1.0 for node-label=default in queue=root, valid capacity should in range of [0, 100].
java.lang.IllegalArgumentException: Illegal capacity of -1.0 for node-label=default in queue=root, valid capacity should in range of [0, 100].
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.internalGetLabeledQueueCapacity(CapacitySchedulerConfiguration.java:465)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getLabeledQueueCapacity(CapacitySchedulerConfiguration.java:477)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadCapacitiesByLabelsFromConf(CSQueueUtils.java:143)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadUpdateAndCheckCapacities(CSQueueUtils.java:122)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupConfigurableCapacities(AbstractCSQueue.java:99)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:242)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setupQueueConfigs(ParentQueue.java:109)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.<init>(ParentQueue.java:100)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:589)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:465)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:297)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:326)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:576)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1016)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:269)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1237)
2015-12-15 01:01:47,672 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(210)) - Stopping ResourceManager metrics system...

My blueprint file is intentionally sparse so I'm only calling out components without setting any configurations unless needed.

{
  "host_groups" : [
    {
      "name" : "host_group_1",
      "configurations" : [ ],
      "components" : [
        { "name" : "ZOOKEEPER_SERVER" },
        { "name" : "ZOOKEEPER_CLIENT" },

...

  ],
  "Blueprints" : {
    "stack_name" : "HDP",
    "stack_version" : "2.2"
  }

I suspect this message a bit up in the logs might be related:

2015-12-15 01:01:47,598 INFO  conf.Configuration (Configuration.java:getConfResourceAsInputStream(2236)) - found resource capacity-scheduler.xml at file:/etc/hadoop/conf.empty/capacity-scheduler.xml
2015-12-15 01:01:47,663 WARN  capacity.CapacitySchedulerConfiguration (CapacitySchedulerConfiguration.java:getAccessibleNodeLabels(433)) - Accessible node labels for root queue will be ignored, it will be automatically set to "*".
2015-12-15 01:01:47,668 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler failed in state INITED; cause: java.lang.IllegalArgumentException: Illegal capacity of -1.0 for node-label=default in queue=root, valid capacity should in range of [0, 100].

Looking in the mentioned .xml file:

    <property>
      <name>yarn.scheduler.capacity.root.accessible-node-labels.default.capacity</name>
      <value>-1</value>
    </property>
    <property>
      <name>yarn.scheduler.capacity.root.accessible-node-labels.default.maximum-capacity</name>
      <value>-1</value>
    </property>

Do I just need to set these in my blueprint file?

NOTE: Here's the full .xml file: capacity-schedulerxml.txt

EDIT #1

I took these 2 properties out of the above .xml file and attempted to restart ResourceManager, but it's still throwing the same exception:

2015-12-15 10:40:51,231 FATAL resourcemanager.ResourceManager (ResourceManager.java:main(1241)) - Error starting ResourceManager
java.lang.IllegalArgumentException: Illegal capacity of -1.0 for node-label=default in queue=root, valid capacity should in range of [0, 100].
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.internalGetLabeledQueueCapacity(CapacitySchedulerConfiguration.java:465)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getLabeledQueueCapacity(CapacitySchedulerConfiguration.java:477)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadCapacitiesByLabelsFromConf(CSQueueUtils.java:143)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadUpdateAndCheckCapacities(CSQueueUtils.java:122)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupConfigurableCapacities(AbstractCSQueue.java:99)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:242)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setupQueueConfigs(ParentQueue.java:109)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.<init>(ParentQueue.java:100)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:589)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:465)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:297)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:326)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:576)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1016)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:269)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1237)
2015-12-15 10:40:51,233 INFO  resourcemanager.ResourceManager (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
1 ACCEPTED SOLUTION

avatar

This sounds like a bug, and both values are removed in newer Ambari versions.

https://issues.apache.org/jira/browse/AMBARI-13232

Could you remove the following two values and try to restart the RM:

  • yarn.scheduler.capacity.root.accessible-node-labels.default.capacity
  • yarn.scheduler.capacity.root.accessible-node-labels.default.maximum-capacity

View solution in original post

16 REPLIES 16

avatar
Master Guru

@Sam Mingolelli - it looks like there is configuration error in capacity scheduler, could you please attach your capacity-scheduler.xml here ? ( default location - /etc/hadoop/conf/capacity-scheduler.xml)

Default working configuration attached. - default-capacity-scheduler.txt

avatar
Rising Star

Thanks, I found this file as well, but the values there I'm not explicitly setting in my blueprint. It has a value of -1 which is outside the valid range 0,100, seems like a bug.

avatar

I have seen this before, but on a HDP 2.3 system. Could you paste the content of your capacity-scheduler.xml (In Ambari within the Yarn config)

avatar
Rising Star

@Jonas Straub - do I just need to provide these 2 params in my blueprint? Seems like a potential bug to me.

avatar

This sounds like a bug, and both values are removed in newer Ambari versions.

https://issues.apache.org/jira/browse/AMBARI-13232

Could you remove the following two values and try to restart the RM:

  • yarn.scheduler.capacity.root.accessible-node-labels.default.capacity
  • yarn.scheduler.capacity.root.accessible-node-labels.default.maximum-capacity

avatar
Rising Star

Took those out of the capacity-scheduler.xml file and attempted to restart, still fails, I've added the messages to the Q.

avatar

can you check to local file (etc/yarn/conf/capacity-scheduler.xml) and make sure the values are not set in there.

make sure there is no RM still running (ps aux | grep resourcemanager) and there is no RM pid-file in /var/run/hadoop-yarn/yarn

avatar
Rising Star

I had to remove the options through the config tab of Ambari but once I did that I was able to restart and resourcemanager is now running.

avatar

Awesome! I am glad its running now 🙂