Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Problems with starting services at ambari after cluster restart

Highlighted

Problems with starting services at ambari after cluster restart

New Contributor

Hello,

I suppose that I have a prmission Problem but have no idea how to solve it.

Last week I restarted my so called namenode after a shutdown. Because of no Special reason I didn't reawaken all corresponding ambari Services last week.

The other two nodes in my cluster can as well jump in as a namenode and I suppose one of them did. When I'm trying to hit "Restart all services" for my namenode in ambari it works fine without any errors.

It takes kind of 20 seconds and all perfectly started services are down again.

Any ideas how I could find out which one is the actual namenode? Or how I can solve this situation?

Thanks for your help!

4 REPLIES 4

Re: Problems with starting services at ambari after cluster restart

Mentor

please look in in /var/log/hadoop for namenode logs.

Re: Problems with starting services at ambari after cluster restart

There are a couple of questions in your post, so I'll try to assist with "Any ideas how I could find out which one is the actual namenode?"

Navigate to the HDFS Services tab in Ambari via Ambari dashboard -> Services tab. Click HDFS on the right and under the summary tab you should see something like:

3185-screen-shot-2016-04-04-at-44326-pm.png

For you one of these should will show as "stopped" and if you hover over or click the link you should see the node(s) the service is assigned. Then ssh into the affected node(s) and view the logs as @Artem Ervits suggests.

Please keep us posted.

Re: Problems with starting services at ambari after cluster restart

New Contributor

Thanks,

apparently I am able to start Data/Namenode with the hadoop-daemon but all Services started with ambari just stop running after a few minutes:(

I was wrong talking about high availibility in our Cluster, we have a secondnamenode(running yeah!) and the namenode (not :() but I have no idea how to fix it

my ambari logfile:

stdout: /var/lib/ambari-agent/data/output-9728.txt
2016-04-05 09:50:30,929 - Group['hadoop'] {'ignore_failures': False}
2016-04-05 09:50:30,930 - Group['users'] {'ignore_failures': False}
2016-04-05 09:50:30,930 - Group['spark'] {'ignore_failures': False}
2016-04-05 09:50:30,930 - User['hive'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-04-05 09:50:30,931 - User['oozie'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['users']}
2016-04-05 09:50:30,931 - User['ambari-qa'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['users']}
2016-04-05 09:50:30,932 - User['hdfs'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-04-05 09:50:30,933 - User['spark'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-04-05 09:50:30,933 - User['mapred'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-04-05 09:50:30,934 - User['hbase'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-04-05 09:50:30,935 - User['tez'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['users']}
2016-04-05 09:50:30,935 - User['zookeeper'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-04-05 09:50:30,936 - User['mahout'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-04-05 09:50:30,936 - User['falcon'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['users']}
2016-04-05 09:50:30,937 - User['sqoop'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-04-05 09:50:30,937 - User['yarn'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-04-05 09:50:30,938 - User['hcat'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-04-05 09:50:30,939 - User['ams'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-04-05 09:50:30,939 - User['atlas'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-04-05 09:50:30,940 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-04-05 09:50:30,941 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2016-04-05 09:50:30,945 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if
2016-04-05 09:50:30,945 - Directory['/tmp/hbase-hbase'] {'owner': 'hbase', 'recursive': True, 'mode': 0775, 'cd_access': 'a'}
2016-04-05 09:50:30,946 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-04-05 09:50:30,947 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'}
2016-04-05 09:50:30,950 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] due to not_if
2016-04-05 09:50:30,950 - Group['hdfs'] {'ignore_failures': False}
2016-04-05 09:50:30,951 - User['hdfs'] {'ignore_failures': False, 'groups': ['hadoop', 'hdfs']}
2016-04-05 09:50:30,951 - Directory['/etc/hadoop'] {'mode': 0755}
2016-04-05 09:50:30,967 - File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'}
2016-04-05 09:50:30,978 - Execute['('setenforce', '0')'] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'}
2016-04-05 09:50:30,982 - Skipping Execute['('setenforce', '0')'] due to not_if
2016-04-05 09:50:30,983 - Directory['/var/log/hadoop'] {'owner': 'root', 'mode': 0775, 'group': 'hadoop', 'recursive': True, 'cd_access': 'a'}
2016-04-05 09:50:30,985 - Directory['/var/run/hadoop'] {'owner': 'root', 'group': 'root', 'recursive': True, 'cd_access': 'a'}
2016-04-05 09:50:30,985 - Changing owner for /var/run/hadoop from 515 to root
2016-04-05 09:50:30,985 - Changing group for /var/run/hadoop from 500 to root
2016-04-05 09:50:30,985 - Directory['/tmp/hadoop-hdfs'] {'owner': 'hdfs', 'recursive': True, 'cd_access': 'a'}
2016-04-05 09:50:30,989 - File['/usr/hdp/current/hadoop-client/conf/commons-logging.properties'] {'content': Template('commons-logging.properties.j2'), 'owner': 'hdfs'}
2016-04-05 09:50:30,991 - File['/usr/hdp/current/hadoop-client/conf/health_check'] {'content': Template('health_check.j2'), 'owner': 'hdfs'}
2016-04-05 09:50:30,992 - File['/usr/hdp/current/hadoop-client/conf/log4j.properties'] {'content': '...', 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644}
2016-04-05 09:50:31,001 - File['/usr/hdp/current/hadoop-client/conf/hadoop-metrics2.properties'] {'content': Template('hadoop-metrics2.properties.j2'), 'owner': 'hdfs'}
2016-04-05 09:50:31,002 - File['/usr/hdp/current/hadoop-client/conf/task-log4j.properties'] {'content': StaticFile('task-log4j.properties'), 'mode': 0755}
2016-04-05 09:50:31,003 - File['/usr/hdp/current/hadoop-client/conf/configuration.xsl'] {'owner': 'hdfs', 'group': 'hadoop'}
2016-04-05 09:50:31,003 - File['/usr/hdp/current/hadoop-client/conf/masters'] {'owner': 'hdfs', 'group': 'hadoop'}
2016-04-05 09:50:31,008 - File['/etc/hadoop/conf/topology_mappings.data'] {'owner': 'hdfs', 'content': Template('topology_mappings.data.j2'), 'group': 'hadoop'}
2016-04-05 09:50:31,008 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'), 'mode': 0755}
2016-04-05 09:50:31,135 - Directory['/etc/security/limits.d'] {'owner': 'root', 'group': 'root', 'recursive': True}
2016-04-05 09:50:31,140 - File['/etc/security/limits.d/hdfs.conf'] {'content': Template('hdfs.conf.j2'), 'owner': 'root', 'group': 'root', 'mode': 0644}
2016-04-05 09:50:31,141 - XmlConfig['hadoop-policy.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2016-04-05 09:50:31,153 - Generating config: /usr/hdp/current/hadoop-client/conf/hadoop-policy.xml
2016-04-05 09:50:31,153 - File['/usr/hdp/current/hadoop-client/conf/hadoop-policy.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-04-05 09:50:31,162 - Writing File['/usr/hdp/current/hadoop-client/conf/hadoop-policy.xml'] because contents don't match
2016-04-05 09:50:31,162 - XmlConfig['ssl-client.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2016-04-05 09:50:31,172 - Generating config: /usr/hdp/current/hadoop-client/conf/ssl-client.xml
2016-04-05 09:50:31,173 - File['/usr/hdp/current/hadoop-client/conf/ssl-client.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-04-05 09:50:31,178 - Writing File['/usr/hdp/current/hadoop-client/conf/ssl-client.xml'] because contents don't match
2016-04-05 09:50:31,179 - Directory['/usr/hdp/current/hadoop-client/conf/secure'] {'owner': 'root', 'group': 'hadoop', 'recursive': True, 'cd_access': 'a'}
2016-04-05 09:50:31,179 - XmlConfig['ssl-client.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf/secure', 'configuration_attributes': {}, 'configurations': ...}
2016-04-05 09:50:31,189 - Generating config: /usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml
2016-04-05 09:50:31,190 - File['/usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-04-05 09:50:31,195 - Writing File['/usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml'] because contents don't match
2016-04-05 09:50:31,196 - XmlConfig['ssl-server.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2016-04-05 09:50:31,206 - Generating config: /usr/hdp/current/hadoop-client/conf/ssl-server.xml
2016-04-05 09:50:31,206 - File['/usr/hdp/current/hadoop-client/conf/ssl-server.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-04-05 09:50:31,213 - Writing File['/usr/hdp/current/hadoop-client/conf/ssl-server.xml'] because contents don't match
2016-04-05 09:50:31,213 - XmlConfig['hdfs-site.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {'final': {'dfs.namenode.http-address': 'true', 'dfs.namenode.acls.enabled': 'true'}}, 'configurations': ...}
2016-04-05 09:50:31,224 - Generating config: /usr/hdp/current/hadoop-client/conf/hdfs-site.xml
2016-04-05 09:50:31,224 - File['/usr/hdp/current/hadoop-client/conf/hdfs-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-04-05 09:50:31,267 - Writing File['/usr/hdp/current/hadoop-client/conf/hdfs-site.xml'] because contents don't match
2016-04-05 09:50:31,267 - XmlConfig['core-site.xml'] {'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'mode': 0644, 'configuration_attributes': {'final': {'fs.defaultFS': 'true'}}, 'owner': 'hdfs', 'configurations': ...}
2016-04-05 09:50:31,277 - Generating config: /usr/hdp/current/hadoop-client/conf/core-site.xml
2016-04-05 09:50:31,278 - File['/usr/hdp/current/hadoop-client/conf/core-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': 0644, 'encoding': 'UTF-8'}
2016-04-05 09:50:31,308 - Writing File['/usr/hdp/current/hadoop-client/conf/core-site.xml'] because contents don't match
2016-04-05 09:50:31,309 - File['/usr/hdp/current/hadoop-client/conf/slaves'] {'content': Template('slaves.j2'), 'owner': 'hdfs'}
2016-04-05 09:50:31,310 - Directory['/hadoop_storage/hdfs/namenode'] {'owner': 'hdfs', 'recursive': True, 'group': 'hadoop', 'mode': 0755, 'cd_access': 'a'}
2016-04-05 09:50:31,311 - Ranger admin not installed
/hadoop_storage/hdfs/namenode/namenode-formatted/ exists. Namenode DFS already formatted
2016-04-05 09:50:31,311 - Directory['/hadoop_storage/hdfs/namenode/namenode-formatted/'] {'recursive': True}
2016-04-05 09:50:31,313 - File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
2016-04-05 09:50:31,314 - Directory['/var/run/hadoop'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 0755}
2016-04-05 09:50:31,314 - Changing owner for /var/run/hadoop from 0 to hdfs
2016-04-05 09:50:31,314 - Changing group for /var/run/hadoop from 0 to hadoop
2016-04-05 09:50:31,314 - Directory['/var/run/hadoop/hdfs'] {'owner': 'hdfs', 'recursive': True}
2016-04-05 09:50:31,314 - Directory['/var/log/hadoop/hdfs'] {'owner': 'hdfs', 'recursive': True}
2016-04-05 09:50:31,315 - File['/var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'] {'action': ['delete'], 'not_if': 'ambari-sudo.sh  -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid && ambari-sudo.sh  -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'}
2016-04-05 09:50:31,327 - Skipping File['/var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'] due to not_if
2016-04-05 09:50:31,328 - Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode''] {'environment': {'HADOOP_LIBEXEC_DIR': '/usr/hdp/current/hadoop-client/libexec'}, 'not_if': 'ambari-sudo.sh  -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid && ambari-sudo.sh  -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'}
2016-04-05 09:50:31,339 - Skipping Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode''] due to not_if
2016-04-05 09:50:31,340 - Must wait to leave safemode since High Availability is not enabled.
2016-04-05 09:50:31,340 - Checking the NameNode safemode status since may need to transition from ON to OFF.
2016-04-05 09:50:31,340 - Execute['hdfs dfsadmin -fs hdfs://namenode.hadoop.gsv:8020 -safemode get | grep 'Safe mode is OFF''] {'logoutput': True, 'tries': 180, 'user': 'hdfs', 'try_sleep': 10}
Safe mode is OFF
2016-04-05 09:50:33,601 - HdfsResource['/tmp'] {'security_enabled': False, 'only_if': None, 'keytab': [EMPTY], 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'default_fs': 'hdfs://namenode.hadoop.gsv:8020', 'hdfs_site': ..., 'kinit_path_local': '/usr/bin/kinit', 'principal_name': None, 'user': 'hdfs', 'owner': 'hdfs', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'directory', 'action': ['create_on_execute'], 'mode': 0777}
2016-04-05 09:50:33,603 - checked_call['curl -sS -L -w '%{http_code}' -X GET 'http://namenode.hadoop.gsv:50070/webhdfs/v1/tmp?op=GETFILESTATUS&user.name=hdfs''] {'logoutput': None, 'user': 'hdfs', 'stderr': -1, 'quiet': False}
2016-04-05 09:50:33,633 - checked_call returned (0, '{"FileStatus":{"accessTime":0,"blockSize":0,"childrenNum":2,"fileId":16386,"group":"hdfs","length":0,"modificationTime":1459840555366,"owner":"hdfs","pathSuffix":"","permission":"777","replication":0,"storagePolicy":0,"type":"DIRECTORY"}}200', '')
2016-04-05 09:50:33,634 - HdfsResource['/user/ambari-qa'] {'security_enabled': False, 'only_if': None, 'keytab': [EMPTY], 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'default_fs': 'hdfs://namenode.hadoop.gsv:8020', 'hdfs_site': ..., 'kinit_path_local': '/usr/bin/kinit', 'principal_name': None, 'user': 'hdfs', 'owner': 'ambari-qa', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'directory', 'action': ['create_on_execute'], 'mode': 0770}
2016-04-05 09:50:33,635 - checked_call['curl -sS -L -w '%{http_code}' -X GET 'http://namenode.hadoop.gsv:50070/webhdfs/v1/user/ambari-qa?op=GETFILESTATUS&user.name=hdfs''] {'logoutput': None, 'user': 'hdfs', 'stderr': -1, 'quiet': False}
2016-04-05 09:50:33,664 - checked_call returned (0, '{"FileStatus":{"accessTime":0,"blockSize":0,"childrenNum":0,"fileId":16399,"group":"hdfs","length":0,"modificationTime":1459836206234,"owner":"ambari-qa","pathSuffix":"","permission":"770","replication":0,"storagePolicy":0,"type":"DIRECTORY"}}200', '')
2016-04-05 09:50:33,665 - HdfsResource['None'] {'security_enabled': False, 'only_if': None, 'keytab': [EMPTY], 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'default_fs': 'hdfs://namenode.hadoop.gsv:8020', 'hdfs_site': ..., 'kinit_path_local': '/usr/bin/kinit', 'principal_name': None, 'user': 'hdfs', 'action': ['execute'], 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf'}

Re: Problems with starting services at ambari after cluster restart

Could you attach the /var/log/hadoop/hdfs/<hadoop-hdfs-namenode>.log?

Don't have an account?
Coming from Hortonworks? Activate your account here