Support Questions

Find answers, ask questions, and share your expertise

HDFS NameNode won't leave safemode

avatar
Expert Contributor

I've setup a HDP 2.6.3 Hadoop Cluster with Ambari 2.5.2 (was HDP 2.5.2 and Ambari 2.4.2 earlier, but had the same situation). When I start all services via the Ambari UI, the process stucks at starting the NameNode service. The output always sais:

2017-11-15 09:23:25,594 - Waiting for this NameNode to leave Safemode due to the following conditions: HA: False, isActive: True, upgradeType: None
2017-11-15 09:23:25,594 - Waiting up to 19 minutes for the NameNode to leave Safemode...
2017-11-15 09:23:25,595 - Execute['/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://my-host.com:8020 -safemode get | grep 'Safe mode is OFF''] {'logoutput': True, 'tries': 115, 'user': 'hdfs', 'try_sleep': 10}
2017-11-15 09:23:27,811 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://my-host.com:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2017-11-15 09:23:40,148 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://my-host.com:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2017-11-15 09:23:52,525 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://my-host.com:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2017-11-15 09:24:04,853 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://my-host.com:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2017-11-15 09:24:17,238 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://my-host.com:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
2017-11-15 09:24:29,566 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://my-host.com:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 
...

I think this issue occurs since I setup the cluster ~6 months ago.

I always try to leave the safemode manually using the command:

hdfs dfsadmin -safemode leave

But this command doesn't help very often, mostly it shows that safemode is still ON. So I have to force the safemode exit with

hdfs dfsadmin -safemode forceExit

Afterwards the NameNode start resumes and all other services start also fine. When I forget to type the forceExit command, the NameNode service start times out and the following service starts fail also.

Can someone explain, why Ambari / NameNode can't leave the safemode automatically? What could here be the reason?

Here a screenshot of my HDFS overview on Ambari, after a successful start of the services (after forceExit):

43602-unbenannt.png

Any help would be appreciated, thank you!

4 REPLIES 4

avatar
Master Mentor

@Daniel Müller

I think your cluster is kerberized. The cause the nameNode is switching to safe mode is due to the communication time out in between the KDC server. The error should appear in the /var/log/hadoop-hdfs log

You should find an error stack like below.

Caused by: javax.security.auth.login.LoginException: Receive timed out at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:808) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at 

The solution on this problem will be adding a line to krb5.conf under the [libdefaults] section: udp_preference_limit = 1

You shouldn't edit the local /etc/krb5.conf but you have to use the Ambari UI go to Ambari > kerberos > Configs > Advanced krb5-conf to make the change.

This ensures that the new setting available on all nodes within the cluster. See atached screenshot save and restart all required services

Please let me know if that helped.


namenode-safemode.jpg

avatar
Expert Contributor

Thank you for the fast answer! I added the property to my Kerberos settings in Ambari, but the problem still exists. Also my /var/log/hadoop-hdfs directory on the NameNode host exists, but it is empty!

avatar
Contributor

Try this:

sudo -u hdfs hadoop dfsadmin -safemode leave

avatar
Expert Contributor

@Daniel Muller, can you grep "Safe mode is" from hdfs namenode log? That will tell the reason why namenode does not exit safemode directly.