Created 08-11-2016 11:08 AM
Hello Community
Ambari is not able to start namenode: in fact, it's not able to execute commande 'ambari-sudo.sh su hdfs -l -s /bin/bash -c ...'. When I try to execute the whole command manually, I'm asked to enter password. Folowing is the stderr.out
has anyone an idea about what cound be the reason?
Thank you
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 317, in <module> NameNode().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 218, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 82, in start namenode(action="start", rolling_restart=rolling_restart, env=env) File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 86, in namenode create_log_dir=True File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py", line 276, in service environment=hadoop_env_exports File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 258, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode'' returned 1. starting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-master01.cl02.sr.private.out
Created 08-12-2016 12:05 PM
I found the cause of the problem. it's configuration matter.
in fact namenode was installed on master01 but following parameter was set with worker02 (on which no namenode) :
dfs.namenode.http-address: worker02.cl02.sr.private:50070 instead of master01.cl02.sr.private:50070
the configuration was altered because the cluster was taken to HA configuration then taken back to non HA. then one of the namenodes was deleted (the one on worker02) without paying attention that the remaining configuration was pointing to worker02.
hope I'm clear 🙂
Created 08-11-2016 11:34 AM
@ What message do you se in the log file "/var/log/hadoop/hdfs/hadoop-hdfs-namenode-master01.cl02.sr.private.out" ?
Created 08-11-2016 08:17 PM
Are you running the Ambari agent as a non-root user? If so, make sure that your sudoers file is correct per this documentation:
Created 08-12-2016 08:53 AM
@emaxwell : it's launched with root
Created 08-12-2016 08:53 AM
@Joy: here is the stdout:
2016-08-09 13:36:36,469 - Group['hadoop'] {'ignore_failures': False} 2016-08-09 13:36:36,470 - Group['users'] {'ignore_failures': False} 2016-08-09 13:36:36,470 - Group['spark'] {'ignore_failures': False} 2016-08-09 13:36:36,471 - User['hive'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']} 2016-08-09 13:36:36,471 - User['ambari-qa'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['users']} 2016-08-09 13:36:36,472 - User['flume'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']} 2016-08-09 13:36:36,472 - User['hdfs'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']} 2016-08-09 13:36:36,473 - User['spark'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']} 2016-08-09 13:36:36,474 - User['mapred'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']} 2016-08-09 13:36:36,474 - User['hbase'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']} 2016-08-09 13:36:36,475 - User['tez'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['users']} 2016-08-09 13:36:36,476 - User['zookeeper'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']} 2016-08-09 13:36:36,476 - User['kafka'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']} 2016-08-09 13:36:36,477 - User['sqoop'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']} 2016-08-09 13:36:36,477 - User['yarn'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']} 2016-08-09 13:36:36,478 - User['hcat'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']} 2016-08-09 13:36:36,478 - User['ams'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']} 2016-08-09 13:36:36,479 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} 2016-08-09 13:36:36,480 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'} 2016-08-09 13:36:36,492 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if 2016-08-09 13:36:36,493 - Directory['/tmp/hbase-hbase'] {'owner': 'hbase', 'recursive': True, 'mode': 0775, 'cd_access': 'a'} 2016-08-09 13:36:36,494 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} 2016-08-09 13:36:36,494 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'} 2016-08-09 13:36:36,506 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] due to not_if 2016-08-09 13:36:36,507 - Group['hdfs'] {'ignore_failures': False} 2016-08-09 13:36:36,507 - User['hdfs'] {'ignore_failures': False, 'groups': ['hadoop', 'hdfs']} 2016-08-09 13:36:36,508 - Directory['/etc/hadoop'] {'mode': 0755} 2016-08-09 13:36:36,521 - File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'} 2016-08-09 13:36:36,535 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'} 2016-08-09 13:36:36,548 - Skipping Execute[('setenforce', '0')] due to not_if 2016-08-09 13:36:36,548 - Directory['/var/log/hadoop'] {'owner': 'root', 'mode': 0775, 'group': 'hadoop', 'recursive': True, 'cd_access': 'a'} 2016-08-09 13:36:36,550 - Directory['/var/run/hadoop'] {'owner': 'root', 'group': 'root', 'recursive': True, 'cd_access': 'a'} 2016-08-09 13:36:36,550 - Changing owner for /var/run/hadoop from 496 to root 2016-08-09 13:36:36,551 - Changing group for /var/run/hadoop from 1002 to root 2016-08-09 13:36:36,551 - Directory['/tmp/hadoop-hdfs'] {'owner': 'hdfs', 'recursive': True, 'cd_access': 'a'} 2016-08-09 13:36:36,555 - File['/usr/hdp/current/hadoop-client/conf/commons-logging.properties'] {'content': Template('commons-logging.properties.j2'), 'owner': 'hdfs'} 2016-08-09 13:36:36,557 - File['/usr/hdp/current/hadoop-client/conf/health_check'] {'content': Template('health_check.j2'), 'owner': 'hdfs'} 2016-08-09 13:36:36,557 - File['/usr/hdp/current/hadoop-client/conf/log4j.properties'] {'content': ..., 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644} 2016-08-09 13:36:36,565 - File['/usr/hdp/current/hadoop-client/conf/hadoop-metrics2.properties'] {'content': Template('hadoop-metrics2.properties.j2'), 'owner': 'hdfs'} 2016-08-09 13:36:36,566 - File['/usr/hdp/current/hadoop-client/conf/task-log4j.properties'] {'content': StaticFile('task-log4j.properties'), 'mode': 0755} 2016-08-09 13:36:36,571 - File['/etc/hadoop/conf/topology_mappings.data'] {'owner': 'hdfs', 'content': Template('topology_mappings.data.j2'), 'only_if': 'test -d /etc/hadoop/conf', 'group': 'hadoop'} 2016-08-09 13:36:36,582 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'), 'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755} 2016-08-09 13:36:36,804 - Directory['/etc/security/limits.d'] {'owner': 'root', 'group': 'root', 'recursive': True} 2016-08-09 13:36:36,809 - File['/etc/security/limits.d/hdfs.conf'] {'content': Template('hdfs.conf.j2'), 'owner': 'root', 'group': 'root', 'mode': 0644} 2016-08-09 13:36:36,810 - XmlConfig['hadoop-policy.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...} 2016-08-09 13:36:36,820 - Generating config: /usr/hdp/current/hadoop-client/conf/hadoop-policy.xml 2016-08-09 13:36:36,821 - File['/usr/hdp/current/hadoop-client/conf/hadoop-policy.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'} 2016-08-09 13:36:36,829 - Writing File['/usr/hdp/current/hadoop-client/conf/hadoop-policy.xml'] because contents don't match 2016-08-09 13:36:36,829 - XmlConfig['ssl-client.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...} 2016-08-09 13:36:36,838 - Generating config: /usr/hdp/current/hadoop-client/conf/ssl-client.xml 2016-08-09 13:36:36,838 - File['/usr/hdp/current/hadoop-client/conf/ssl-client.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'} 2016-08-09 13:36:36,843 - Writing File['/usr/hdp/current/hadoop-client/conf/ssl-client.xml'] because contents don't match 2016-08-09 13:36:36,844 - Directory['/usr/hdp/current/hadoop-client/conf/secure'] {'owner': 'root', 'group': 'hadoop', 'recursive': True, 'cd_access': 'a'} 2016-08-09 13:36:36,844 - XmlConfig['ssl-client.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf/secure', 'configuration_attributes': {}, 'configurations': ...} 2016-08-09 13:36:36,853 - Generating config: /usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml 2016-08-09 13:36:36,854 - File['/usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'} 2016-08-09 13:36:36,859 - Writing File['/usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml'] because contents don't match 2016-08-09 13:36:36,859 - XmlConfig['ssl-server.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...} 2016-08-09 13:36:36,868 - Generating config: /usr/hdp/current/hadoop-client/conf/ssl-server.xml 2016-08-09 13:36:36,868 - File['/usr/hdp/current/hadoop-client/conf/ssl-server.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'} 2016-08-09 13:36:36,874 - Writing File['/usr/hdp/current/hadoop-client/conf/ssl-server.xml'] because contents don't match 2016-08-09 13:36:36,874 - XmlConfig['hdfs-site.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...} 2016-08-09 13:36:36,883 - Generating config: /usr/hdp/current/hadoop-client/conf/hdfs-site.xml 2016-08-09 13:36:36,883 - File['/usr/hdp/current/hadoop-client/conf/hdfs-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'} 2016-08-09 13:36:36,928 - Writing File['/usr/hdp/current/hadoop-client/conf/hdfs-site.xml'] because contents don't match 2016-08-09 13:36:36,929 - XmlConfig['core-site.xml'] {'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'mode': 0644, 'configuration_attributes': {}, 'owner': 'hdfs', 'configurations': ...} 2016-08-09 13:36:36,938 - Generating config: /usr/hdp/current/hadoop-client/conf/core-site.xml 2016-08-09 13:36:36,939 - File['/usr/hdp/current/hadoop-client/conf/core-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': 0644, 'encoding': 'UTF-8'} 2016-08-09 13:36:36,956 - Writing File['/usr/hdp/current/hadoop-client/conf/core-site.xml'] because contents don't match 2016-08-09 13:36:36,958 - File['/usr/hdp/current/hadoop-client/conf/slaves'] {'content': Template('slaves.j2'), 'owner': 'hdfs'} 2016-08-09 13:36:36,959 - Directory['/data01/hadoop/hdfs/namenode'] {'owner': 'hdfs', 'cd_access': 'a', 'group': 'hadoop', 'recursive': True, 'mode': 0755} 2016-08-09 13:36:36,959 - Directory['/data02/hadoop/hdfs/namenode'] {'owner': 'hdfs', 'recursive': True, 'group': 'hadoop', 'mode': 0755, 'cd_access': 'a'} 2016-08-09 13:36:36,960 - Ranger admin not installed /data01/hadoop/hdfs/namenode/namenode-formatted/ exists. Namenode DFS already formatted /data02/hadoop/hdfs/namenode/namenode-formatted/ exists. Namenode DFS already formatted 2016-08-09 13:36:36,960 - Directory['/data01/hadoop/hdfs/namenode/namenode-formatted/'] {'recursive': True} 2016-08-09 13:36:36,960 - Directory['/data02/hadoop/hdfs/namenode/namenode-formatted/'] {'recursive': True} 2016-08-09 13:36:36,962 - File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'} 2016-08-09 13:36:36,963 - Directory['/var/run/hadoop'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 0755} 2016-08-09 13:36:36,963 - Changing owner for /var/run/hadoop from 0 to hdfs 2016-08-09 13:36:36,963 - Changing group for /var/run/hadoop from 0 to hadoop 2016-08-09 13:36:36,963 - Directory['/var/run/hadoop/hdfs'] {'owner': 'hdfs', 'recursive': True} 2016-08-09 13:36:36,963 - Directory['/var/log/hadoop/hdfs'] {'owner': 'hdfs', 'recursive': True} 2016-08-09 13:36:36,964 - File['/var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'] {'action': ['delete'], 'not_if': 'ambari-sudo.sh -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid && ambari-sudo.sh -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'} 2016-08-09 13:36:36,982 - Deleting File['/var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'] 2016-08-09 13:36:36,982 - Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode''] {'environment': {'HADOOP_LIBEXEC_DIR': '/usr/hdp/current/hadoop-client/libexec'}, 'not_if': 'ambari-sudo.sh -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid && ambari-sudo.sh -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'}
Created 08-12-2016 09:00 AM
in the file: /var/log/hadoop/hdfs/hadoop-hdfs-namenode-master01.cl02.sr.private.out
ulimit -a for user hdfs core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 257395 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 128000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 65536 virtual memory (kbytes, -v) unlimited file locks (-x) 100000
Created 08-12-2016 12:05 PM
I found the cause of the problem. it's configuration matter.
in fact namenode was installed on master01 but following parameter was set with worker02 (on which no namenode) :
dfs.namenode.http-address: worker02.cl02.sr.private:50070 instead of master01.cl02.sr.private:50070
the configuration was altered because the cluster was taken to HA configuration then taken back to non HA. then one of the namenodes was deleted (the one on worker02) without paying attention that the remaining configuration was pointing to worker02.
hope I'm clear 🙂