Support Questions

Find answers, ask questions, and share your expertise

Ambari can't start namenode: ambari-sudo.sh returned 1

avatar
Contributor

Hello Community

Ambari is not able to start namenode: in fact, it's not able to execute commande 'ambari-sudo.sh su hdfs -l -s /bin/bash -c ...'. When I try to execute the whole command manually, I'm asked to enter password. Folowing is the stderr.out

has anyone an idea about what cound be the reason?

Thank you

Traceback (most recent call last):

  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 317, in <module>
    NameNode().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 218, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 82, in start
    namenode(action="start", rolling_restart=rolling_restart, env=env)
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 86, in namenode
    create_log_dir=True
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py", line 276, in service
    environment=hadoop_env_exports
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 258, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
    tries=tries, try_sleep=try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode'' returned 1. starting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-master01.cl02.sr.private.out
1 ACCEPTED SOLUTION

avatar
Contributor

I found the cause of the problem. it's configuration matter.

in fact namenode was installed on master01 but following parameter was set with worker02 (on which no namenode) :

dfs.namenode.http-address: worker02.cl02.sr.private:50070 instead of master01.cl02.sr.private:50070

the configuration was altered because the cluster was taken to HA configuration then taken back to non HA. then one of the namenodes was deleted (the one on worker02) without paying attention that the remaining configuration was pointing to worker02.

hope I'm clear 🙂

View solution in original post

6 REPLIES 6

avatar

@mohamed sabri marnaoui

@ What message do you se in the log file "/var/log/hadoop/hdfs/hadoop-hdfs-namenode-master01.cl02.sr.private.out" ?

avatar

@mohamed sabri marnaoui

Are you running the Ambari agent as a non-root user? If so, make sure that your sudoers file is correct per this documentation:

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_Security_Guide/content/_sudoer_configurat...

avatar
Contributor

@emaxwell : it's launched with root

avatar
Contributor

@Joy: here is the stdout:

2016-08-09 13:36:36,469 - Group['hadoop'] {'ignore_failures': False}
2016-08-09 13:36:36,470 - Group['users'] {'ignore_failures': False}
2016-08-09 13:36:36,470 - Group['spark'] {'ignore_failures': False}
2016-08-09 13:36:36,471 - User['hive'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,471 - User['ambari-qa'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['users']}
2016-08-09 13:36:36,472 - User['flume'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,472 - User['hdfs'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,473 - User['spark'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,474 - User['mapred'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,474 - User['hbase'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,475 - User['tez'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['users']}
2016-08-09 13:36:36,476 - User['zookeeper'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,476 - User['kafka'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,477 - User['sqoop'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,477 - User['yarn'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,478 - User['hcat'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,478 - User['ams'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,479 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-08-09 13:36:36,480 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2016-08-09 13:36:36,492 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if
2016-08-09 13:36:36,493 - Directory['/tmp/hbase-hbase'] {'owner': 'hbase', 'recursive': True, 'mode': 0775, 'cd_access': 'a'}
2016-08-09 13:36:36,494 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-08-09 13:36:36,494 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'}
2016-08-09 13:36:36,506 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] due to not_if
2016-08-09 13:36:36,507 - Group['hdfs'] {'ignore_failures': False}
2016-08-09 13:36:36,507 - User['hdfs'] {'ignore_failures': False, 'groups': ['hadoop', 'hdfs']}
2016-08-09 13:36:36,508 - Directory['/etc/hadoop'] {'mode': 0755}
2016-08-09 13:36:36,521 - File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'}
2016-08-09 13:36:36,535 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'}
2016-08-09 13:36:36,548 - Skipping Execute[('setenforce', '0')] due to not_if
2016-08-09 13:36:36,548 - Directory['/var/log/hadoop'] {'owner': 'root', 'mode': 0775, 'group': 'hadoop', 'recursive': True, 'cd_access': 'a'}
2016-08-09 13:36:36,550 - Directory['/var/run/hadoop'] {'owner': 'root', 'group': 'root', 'recursive': True, 'cd_access': 'a'}
2016-08-09 13:36:36,550 - Changing owner for /var/run/hadoop from 496 to root
2016-08-09 13:36:36,551 - Changing group for /var/run/hadoop from 1002 to root
2016-08-09 13:36:36,551 - Directory['/tmp/hadoop-hdfs'] {'owner': 'hdfs', 'recursive': True, 'cd_access': 'a'}
2016-08-09 13:36:36,555 - File['/usr/hdp/current/hadoop-client/conf/commons-logging.properties'] {'content': Template('commons-logging.properties.j2'), 'owner': 'hdfs'}
2016-08-09 13:36:36,557 - File['/usr/hdp/current/hadoop-client/conf/health_check'] {'content': Template('health_check.j2'), 'owner': 'hdfs'}
2016-08-09 13:36:36,557 - File['/usr/hdp/current/hadoop-client/conf/log4j.properties'] {'content': ..., 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644}
2016-08-09 13:36:36,565 - File['/usr/hdp/current/hadoop-client/conf/hadoop-metrics2.properties'] {'content': Template('hadoop-metrics2.properties.j2'), 'owner': 'hdfs'}
2016-08-09 13:36:36,566 - File['/usr/hdp/current/hadoop-client/conf/task-log4j.properties'] {'content': StaticFile('task-log4j.properties'), 'mode': 0755}
2016-08-09 13:36:36,571 - File['/etc/hadoop/conf/topology_mappings.data'] {'owner': 'hdfs', 'content': Template('topology_mappings.data.j2'), 'only_if': 'test -d /etc/hadoop/conf', 'group': 'hadoop'}
2016-08-09 13:36:36,582 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'), 'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755}
2016-08-09 13:36:36,804 - Directory['/etc/security/limits.d'] {'owner': 'root', 'group': 'root', 'recursive': True}
2016-08-09 13:36:36,809 - File['/etc/security/limits.d/hdfs.conf'] {'content': Template('hdfs.conf.j2'), 'owner': 'root', 'group': 'root', 'mode': 0644}
2016-08-09 13:36:36,810 - XmlConfig['hadoop-policy.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2016-08-09 13:36:36,820 - Generating config: /usr/hdp/current/hadoop-client/conf/hadoop-policy.xml
2016-08-09 13:36:36,821 - File['/usr/hdp/current/hadoop-client/conf/hadoop-policy.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-08-09 13:36:36,829 - Writing File['/usr/hdp/current/hadoop-client/conf/hadoop-policy.xml'] because contents don't match
2016-08-09 13:36:36,829 - XmlConfig['ssl-client.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2016-08-09 13:36:36,838 - Generating config: /usr/hdp/current/hadoop-client/conf/ssl-client.xml
2016-08-09 13:36:36,838 - File['/usr/hdp/current/hadoop-client/conf/ssl-client.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-08-09 13:36:36,843 - Writing File['/usr/hdp/current/hadoop-client/conf/ssl-client.xml'] because contents don't match
2016-08-09 13:36:36,844 - Directory['/usr/hdp/current/hadoop-client/conf/secure'] {'owner': 'root', 'group': 'hadoop', 'recursive': True, 'cd_access': 'a'}
2016-08-09 13:36:36,844 - XmlConfig['ssl-client.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf/secure', 'configuration_attributes': {}, 'configurations': ...}
2016-08-09 13:36:36,853 - Generating config: /usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml
2016-08-09 13:36:36,854 - File['/usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-08-09 13:36:36,859 - Writing File['/usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml'] because contents don't match
2016-08-09 13:36:36,859 - XmlConfig['ssl-server.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2016-08-09 13:36:36,868 - Generating config: /usr/hdp/current/hadoop-client/conf/ssl-server.xml
2016-08-09 13:36:36,868 - File['/usr/hdp/current/hadoop-client/conf/ssl-server.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-08-09 13:36:36,874 - Writing File['/usr/hdp/current/hadoop-client/conf/ssl-server.xml'] because contents don't match
2016-08-09 13:36:36,874 - XmlConfig['hdfs-site.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2016-08-09 13:36:36,883 - Generating config: /usr/hdp/current/hadoop-client/conf/hdfs-site.xml
2016-08-09 13:36:36,883 - File['/usr/hdp/current/hadoop-client/conf/hdfs-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-08-09 13:36:36,928 - Writing File['/usr/hdp/current/hadoop-client/conf/hdfs-site.xml'] because contents don't match
2016-08-09 13:36:36,929 - XmlConfig['core-site.xml'] {'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'mode': 0644, 'configuration_attributes': {}, 'owner': 'hdfs', 'configurations': ...}
2016-08-09 13:36:36,938 - Generating config: /usr/hdp/current/hadoop-client/conf/core-site.xml
2016-08-09 13:36:36,939 - File['/usr/hdp/current/hadoop-client/conf/core-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': 0644, 'encoding': 'UTF-8'}
2016-08-09 13:36:36,956 - Writing File['/usr/hdp/current/hadoop-client/conf/core-site.xml'] because contents don't match
2016-08-09 13:36:36,958 - File['/usr/hdp/current/hadoop-client/conf/slaves'] {'content': Template('slaves.j2'), 'owner': 'hdfs'}
2016-08-09 13:36:36,959 - Directory['/data01/hadoop/hdfs/namenode'] {'owner': 'hdfs', 'cd_access': 'a', 'group': 'hadoop', 'recursive': True, 'mode': 0755}
2016-08-09 13:36:36,959 - Directory['/data02/hadoop/hdfs/namenode'] {'owner': 'hdfs', 'recursive': True, 'group': 'hadoop', 'mode': 0755, 'cd_access': 'a'}
2016-08-09 13:36:36,960 - Ranger admin not installed
/data01/hadoop/hdfs/namenode/namenode-formatted/ exists. Namenode DFS already formatted
/data02/hadoop/hdfs/namenode/namenode-formatted/ exists. Namenode DFS already formatted
2016-08-09 13:36:36,960 - Directory['/data01/hadoop/hdfs/namenode/namenode-formatted/'] {'recursive': True}
2016-08-09 13:36:36,960 - Directory['/data02/hadoop/hdfs/namenode/namenode-formatted/'] {'recursive': True}
2016-08-09 13:36:36,962 - File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
2016-08-09 13:36:36,963 - Directory['/var/run/hadoop'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 0755}
2016-08-09 13:36:36,963 - Changing owner for /var/run/hadoop from 0 to hdfs
2016-08-09 13:36:36,963 - Changing group for /var/run/hadoop from 0 to hadoop
2016-08-09 13:36:36,963 - Directory['/var/run/hadoop/hdfs'] {'owner': 'hdfs', 'recursive': True}
2016-08-09 13:36:36,963 - Directory['/var/log/hadoop/hdfs'] {'owner': 'hdfs', 'recursive': True}
2016-08-09 13:36:36,964 - File['/var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'] {'action': ['delete'], 'not_if': 'ambari-sudo.sh  -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid && ambari-sudo.sh  -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'}
2016-08-09 13:36:36,982 - Deleting File['/var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid']
2016-08-09 13:36:36,982 - Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode''] {'environment': {'HADOOP_LIBEXEC_DIR': '/usr/hdp/current/hadoop-client/libexec'}, 'not_if': 'ambari-sudo.sh  -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid && ambari-sudo.sh  -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'}

avatar
Contributor

in the file: /var/log/hadoop/hdfs/hadoop-hdfs-namenode-master01.cl02.sr.private.out

ulimit -a for user hdfs core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 257395 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 128000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 65536 virtual memory (kbytes, -v) unlimited file locks (-x) 100000

avatar
Contributor

I found the cause of the problem. it's configuration matter.

in fact namenode was installed on master01 but following parameter was set with worker02 (on which no namenode) :

dfs.namenode.http-address: worker02.cl02.sr.private:50070 instead of master01.cl02.sr.private:50070

the configuration was altered because the cluster was taken to HA configuration then taken back to non HA. then one of the namenodes was deleted (the one on worker02) without paying attention that the remaining configuration was pointing to worker02.

hope I'm clear 🙂