Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hortonworks : Start Resource Manager Failed

avatar
Explorer

Traceback (most recent call last):

  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/resourcemanager.py", line 275, in <module>
    Resourcemanager().execute()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 353, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/resourcemanager.py", line 158, in start
    service('resourcemanager', action='start')
  File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/service.py", line 92, in service
    Execute(daemon_cmd, user = usr, not_if = check_process)
  File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
    self.env.run()
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run
    returns=self.resource.returns)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
    tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
    raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'ulimit -c unlimited; export HADOOP_LIBEXEC_DIR=/usr/hdp/3.0.0.0-1634/hadoop/libexec && /usr/hdp/3.0.0.0-1634/hadoop-yarn/bin/yarn --config /usr/hdp/3.0.0.0-1634/hadoop/conf --daemon start resourcemanager' returned 1.
1 ACCEPTED SOLUTION

avatar
Expert Contributor
@Prashant Gupta

From your logs attached, it looks like you have enabled GPU Scheduling. But it is still using the DefaultResourceCalculator.

2018-10-22 17:48:02,490 FATAL resourcemanager.ResourceManager (ResourceManager.java:main(1495)) - Error starting ResourceManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: RM uses DefaultResourceCalculator which used only memory as resource-type but invalid resource-types specified {yarn.io/gpu=name: yarn.io/gpu, units: , type: COUNTABLE, value: 0, minimum allocation: 0, maximum allocation: 9223372036854775807, memory-mb=name: memory-mb, units: Mi, type: COUNTABLE, value: 0, minimum allocation: 1024, maximum allocation: 191488, vcores=name: vcores, units: , type: COUNTABLE, value: 0, minimum allocation: 1, maximum allocation: 32}. Use DomainantResourceCalculator instead to make effective use of these resource-types

In YARN->Configs->Advanced->Scheduler , set the following

yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator

View solution in original post

8 REPLIES 8

avatar
Master Mentor

@Prashant Gupta

As we see that ambari triggered the following command which did not get executed. Hence there might be some additional logging happened inside the "resource manager" logs.

I will suggest you to please check and share the RM logs it should show the cause of command execution failure.

Other option to isolate the issue will be to try restarting the ResourceManager manually using the same command to see if it works?

# su - yarn

# ulimit -c unlimited; export HADOOP_LIBEXEC_DIR=/usr/hdp/3.0.0.0-1634/hadoop/libexec && /usr/hdp/3.0.0.0-1634/hadoop-yarn/bin/yarn --config /usr/hdp/3.0.0.0-1634/hadoop/conf --daemon start resourcemanager

.

If you face any issue then please share the logs.

avatar
Explorer

Dear Jay,

Please find the resource manager log file after executed the above the command.

we have changed server name to localhost , but we r using FQN.

Thanks & Regards,

Prashant Gupta

resourcemanager-logfile.txt

avatar
Explorer

Dear Jay,

Any update for above the query.

Thanks & Regards,

Prashant Gupta

avatar
Expert Contributor
@Prashant Gupta

From your logs attached, it looks like you have enabled GPU Scheduling. But it is still using the DefaultResourceCalculator.

2018-10-22 17:48:02,490 FATAL resourcemanager.ResourceManager (ResourceManager.java:main(1495)) - Error starting ResourceManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: RM uses DefaultResourceCalculator which used only memory as resource-type but invalid resource-types specified {yarn.io/gpu=name: yarn.io/gpu, units: , type: COUNTABLE, value: 0, minimum allocation: 0, maximum allocation: 9223372036854775807, memory-mb=name: memory-mb, units: Mi, type: COUNTABLE, value: 0, minimum allocation: 1024, maximum allocation: 191488, vcores=name: vcores, units: , type: COUNTABLE, value: 0, minimum allocation: 1, maximum allocation: 32}. Use DomainantResourceCalculator instead to make effective use of these resource-types

In YARN->Configs->Advanced->Scheduler , set the following

yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator

avatar
Master Mentor

@Tarun Parimi

Thank you for sharing the working solution. I am marking this thread as resolved.

avatar
Explorer

Dear Jay,

After added above the property it's working fine.

Thanks for support.

Regards,

Prashant Gupta

avatar
Expert Contributor

@Prashant Gupta Good to know that the ResourceManager started successfully. Kindly mark the answer as accepted if the problem got resolved.

avatar
Explorer

Dear Jay,


When i am trying to enable HA in Ambari .then getting below error .


Traceback (most recent call last):

File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 348, in <module>

NameNode().execute()

File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 375, in execute

method(env)

File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 90, in start

upgrade_suspended=params.upgrade_suspended, env=env)

File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89, in thunk

return fn(*args, **kwargs)

File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 175, in namenode

create_log_dir=True

File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py", line 276, in service

Execute(daemon_cmd, not_if=process_id_exists_command, environment=hadoop_env_exports)

File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__

self.env.run()

File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run

self.run_action(resource, action)

File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action

provider_action()

File "/usr/lib/ambari-agent/lib/rexample.comesource_management/core/providers/system.py", line 262, in action_run

tries=self.resource.tries, try_sleep=self.resource.try_sleep)

File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner

result = function(command, **kwargs)

File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call

tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)

File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper

result = _call(command, **kwargs_copy)

File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 303, in _call

raise ExecutionFailed(err_msg, code, out, err)

resource_management.core.exceptions.ExecutionFailed: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/2.6.5.0-292/hadoop/sbin/hadoop-daemon.sh --config /usr/hdp/2.6.5.0-292/hadoop/conf start namenode'' returned 1. starting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-HBDCAUTDBN14.cidr.gov.in.out

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".

SLF4J: Defaulting to no-operation (NOP) logger implementation

SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.


Thanks & Regards,

Prashant Gupta