Support Questions
Find answers, ask questions, and share your expertise

unable to start node manager on all the nodes.

Highlighted

unable to start node manager on all the nodes.

Explorer

Trying to start the node manager on any of the nodes I have on the ec2 cluster and I keep getting this error.

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/nodemanager.py", line 153, in <module>
    Nodemanager().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/nodemanager.py", line 48, in start
    import params
  File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/params.py", line 28, in <module>
    from params_linux import *
  File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/params_linux.py", line 164, in <module>
    if len(rm_hosts) > 1:
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/config_dictionary.py", line 81, in __getattr__
    raise Fail("Configuration parameter '" + self.name + "' was not found in configurations dictionary!") 

resource_management.core.exceptions.Fail: Configuration parameter 'rm_host' was not found in configurations dictionary!

The stdout is:

2016-08-23 19:08:20,561 - Group['spark'] {}
2016-08-23 19:08:20,562 - Group['hadoop'] {}
2016-08-23 19:08:20,562 - Group['nifi'] {}
2016-08-23 19:08:20,563 - Group['users'] {}
2016-08-23 19:08:20,563 - User['hive'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-08-23 19:08:20,563 - User['storm'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-08-23 19:08:20,564 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-08-23 19:08:20,565 - User['oozie'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'users']}
2016-08-23 19:08:20,565 - User['ams'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-08-23 19:08:20,566 - User['falcon'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'users']}
2016-08-23 19:08:20,566 - User['tez'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'users']}
2016-08-23 19:08:20,567 - User['nifi'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-08-23 19:08:20,567 - User['accumulo'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-08-23 19:08:20,568 - User['spark'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-08-23 19:08:20,568 - User['ambari-qa'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'users']}
2016-08-23 19:08:20,569 - User['flume'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-08-23 19:08:20,569 - User['kafka'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-08-23 19:08:20,570 - User['hdfs'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-08-23 19:08:20,570 - User['sqoop'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-08-23 19:08:20,571 - User['yarn'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-08-23 19:08:20,571 - User['mapred'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-08-23 19:08:20,572 - User['hbase'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-08-23 19:08:20,572 - User['hcat'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-08-23 19:08:20,573 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-08-23 19:08:20,574 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2016-08-23 19:08:20,578 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if
2016-08-23 19:08:20,579 - Directory['/tmp/hbase-hbase'] {'owner': 'hbase', 'recursive': True, 'mode': 0775, 'cd_access': 'a'}
2016-08-23 19:08:20,579 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-08-23 19:08:20,580 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'}
2016-08-23 19:08:20,584 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] due to not_if
2016-08-23 19:08:20,593 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'}
2016-08-23 19:08:20,600 - Skipping Execute[('setenforce', '0')] due to only_if
2016-08-23 19:08:20,607 - File['/etc/hadoop/conf/topology_mappings.data'] {'owner': 'hdfs', 'content': Template('topology_mappings.data.j2'), 'only_if': 'test -d /etc/hadoop/conf', 'group': 'hadoop'}
2016-08-23 19:08:20,611 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'), 'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755}
2016-08-23 19:08:20,758 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.4.2.0-258
2016-08-23 19:08:20,758 - Checking if need to create versioned conf dir /etc/hadoop/2.4.2.0-258/0
2016-08-23 19:08:20,759 - call['conf-select create-conf-dir --package hadoop --stack-version 2.4.2.0-258 --conf-version 0'] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1}
2016-08-23 19:08:20,780 - call returned (1, '/etc/hadoop/2.4.2.0-258/0 exist already', '')
2016-08-23 19:08:20,780 - checked_call['conf-select set-conf-dir --package hadoop --stack-version 2.4.2.0-258 --conf-version 0'] {'logoutput': False, 'sudo': True, 'quiet': False}
2016-08-23 19:08:20,801 - checked_call returned (0, '')
2016-08-23 19:08:20,801 - Ensuring that hadoop has the correct symlink structure
2016-08-23 19:08:20,801 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2016-08-23 19:08:20,823 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.4.2.0-258
2016-08-23 19:08:20,823 - Checking if need to create versioned conf dir /etc/hadoop/2.4.2.0-258/0
2016-08-23 19:08:20,823 - call['conf-select create-conf-dir --package hadoop --stack-version 2.4.2.0-258 --conf-version 0'] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1}
2016-08-23 19:08:20,845 - call returned (1, '/etc/hadoop/2.4.2.0-258/0 exist already', '')
2016-08-23 19:08:20,845 - checked_call['conf-select set-conf-dir --package hadoop --stack-version 2.4.2.0-258 --conf-version 0'] {'logoutput': False, 'sudo': True, 'quiet': False}
2016-08-23 19:08:20,866 - checked_call returned (0, '')
2016-08-23 19:08:20,866 - Ensuring that hadoop has the correct symlink structure 

2016-08-23 19:08:20,866 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf

4 REPLIES 4
Highlighted

Re: unable to start node manager on all the nodes.

Explorer

@Theyaa Matti Can you go to Ambari , select yarn service and the click on configs and select last working configuration and try starting the Node Manager see if this helps.

Highlighted

Re: unable to start node manager on all the nodes.

Rising Star

@Theyaa Matti :

Can you check if the Yarn config property "yarn.resourcemanager.hostname" is set?

If there is a mandatory field missing, click on Yarn configs tab and there will be a red mark to indicate incorrect field.

Also, like @sganatra suggested, compare your config with earlier versions of config to find out what change might have caused this issue.

Highlighted

Re: unable to start node manager on all the nodes.

Explorer

@sganatra @sbhat, I did what you have suggested and nothing improved, I get the same issue. Might be the ambari database corrupted?

Re: unable to start node manager on all the nodes.

Explorer

@Theyaa Matti Can you try manual start NodeManager? You can do that by running command "/usr/hdp/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh start nodemanager" as user yarn. Then run command "ps -ef | grep -i nodemanager" to verify the nodemanager is running. If not, can you check the nodemanager log again and see if it give you the same error?

Don't have an account?