Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Long startup time for NameNode

avatar
Contributor

I have just enabled kerberos on the hadoop cluster. The whole process went fairly smooth. However after I needed to restart all the services I noticed that it over 30 min for the NameNode to start up. During these 30 min it seems that hdfs did not have a valid TGT based on the messages below. After patiently waiting and thinking it is going to fail any moment it in fact DID come up. My question is why it took so long and what was the problem of obtaining TGT from the very beginning?

2018-10-27 23:28:54,899 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://*******:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. 18/10/27 23:28:54 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
safemode: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "*******/11*.11*.11*.11*"; destination host is: "***.***.***":8020; 
18/10/27 23:29:09 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 

safemode: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "*******/11*.11*.11*.11*"; destination host is: "************":8020;

1 ACCEPTED SOLUTION

avatar
Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
2 REPLIES 2

avatar
Contributor

Just wanted to add a couple of notes to the above.. I have just installed Zeppelin Noted to one of the cluster nodes. After the installation I noticed there is a need to restart NameNode, Secondary NameNode and MapReduce2. NameNode was restarting for 30 minutes with exactly the same symptoms as in the above log, but this time it failed. I'm still digging and trying to understand why it is happening, but do have a couple of questions in the meantime:

1. Why there is a need to restart these services after Zeppelin Notes installation. Not sure if I follow what these dependencies are.

2. What could be a reason that TGT is not found?

avatar
Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login