Support Questions

Find answers, ask questions, and share your expertise

ERROR namenode.NameNode (NameNode.java:main(1783)) - Failed to start namenode. javax.security.auth.login.LoginException: Cannot locate KDC

avatar

Trying to enable kerberos in my cluster and to use 3rd option from ambari wizard (manual method to distributing keytabs)

I have created principals and keytabs in AD and distributed to hadoop server but when I start service it is throwing below error.

[root@node keytabs]# kinit -kt nn.service.keytab nn/node.whishworks.net@WHISHWORKS.NET

[root@node keytabs]# klist

Ticket cache: FILE:/tmp/krb5cc_0

Default principal: nn/node.whishworks.net@WHISHWORKS.NET

Valid starting Expires Service principal 07/10/2018 21:16:38 07/11/2018 07:16:38 krbtgt/WHISHWORKS.NET@WHISHWORKS.NET renew until 07/17/2018 21:16:38 [root@node keytabs]#

2018-07-10 20:47:49,091 ERROR namenode.NameNode (NameNode.java:main(1783)) - Failed to start namenode. java.io.IOException: Login failure for nn/node.whishworks.net@WHISHWORKS.NET from keytab /etc/security/keytabs/nn.service.keytab: javax.security.auth.login.LoginException: Cannot locate KDC at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1098) at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:307) at org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(NameNode.java:726) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:745) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:1001) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:985) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1710) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1778) Caused by: javax.security.auth.login.LoginException: Cannot locate KDC at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at javax.security.auth.login.LoginContext.login(LoginContext.java:587) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1089) ... 7 more Caused by: KrbException: Cannot locate KDC at sun.security.krb5.Config.getKDCList(Config.java:1084) at sun.security.krb5.KdcComm.send(KdcComm.java:218) at sun.security.krb5.KdcComm.send(KdcComm.java:200) at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:316) at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:361) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java
7 REPLIES 7

avatar
Cloudera Employee

This error states that NN is unable to determine the KDC server for kerberos authentication. Can you please validate your /etc/krb5.conf and that has got the realms pointing to your AD server?

avatar

NN show started in Ambari but still it is showing below error in NN logs and also when I run hdfs dfs -ls / with valid kerberos ticket.

[root@node keytabs]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: nn/node.whishworks.net@WHISHWORKS.NET
Valid starting       Expires              Service principal
07/11/2018 07:21:04  07/11/2018 17:21:04  krbtgt/WHISHWORKS.NET@WHISHWORKS.NET
        renew until 07/18/2018 07:21:04
[root@node keytabs]#

18/07/11 07:00:09 WARN ipc.Client: Couldn't setup connection for nn/node.whishworks.net@WHISHWORKS.NET to node.whishworks.net/172.31.50.76:8020 javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7))] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:414) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:595) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:397) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:762) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:758) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:758) at org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1620) at org.apache.hadoop.ipc.Client.call(Client.java:1451) at org.apache.hadoop.ipc.Client.call(Client.java:1398) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:823) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:290) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:202) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:184) at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2177) at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1442) at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1438) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1454) at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) at org.apache.hadoop.fs.Globber.glob(Globber.java:265) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1697) at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326) at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235) at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218) at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:103) at org.apache.hadoop.fs.shell.Command.run(Command.java:165) at org.apache.hadoop.fs.FsShell.run(FsShell.java:297) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.fs.FsShell.main(FsShell.java:356) Caused by: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7)) at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:770) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192) ... 41 more Caused by: KrbException: Server not found in Kerberos database (7) at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:70) at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251) at sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262) at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308) at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126) at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458) at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:693) ... 44 more Caused by: KrbException: Identifier doesn't match expected value (906) at sun.security.krb5.internal.KDCRep.init(KDCRep.java:140) at sun.security.krb5.internal.TGSRep.init(TGSRep.java:65) at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60) at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55) ... 50 more ls: Failed on local exception: java.io.IOException: Couldn't setup connection for nn/node.whishworks.net@WHISHWORKS.NET to node.whishworks.net/172.31.50.76:8020; Host Details : local host is: "node.whishworks.net/172.31.50.76"; destination host is: "node.whishworks.net":8020;

avatar
Cloudera Employee

This time the error is different

GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7))]

The exception means there is a mismatch between the key stored in the keytab and the key in the AD, please check the encryption type and also the Key Version Number (kvno)

avatar

avatar
[root@node ~]# klist -kte /etc/security/keytabs/spnego.service.keytab
Keytab name: FILE:/etc/security/keytabs/spnego.service.keytab
KVNO Timestamp           Principal
---- ------------------- ------------------------------------------------------
   1 12/31/1969 19:00:00 HTTP/node.whishworks.net@WHISHWORKS.NET (des-cbc-crc)
   1 12/31/1969 19:00:00 HTTP/node.whishworks.net@WHISHWORKS.NET (des-cbc-md5)
   1 12/31/1969 19:00:00 HTTP/node.whishworks.net@WHISHWORKS.NET (arcfour-hmac)
   1 12/31/1969 19:00:00 HTTP/node.whishworks.net@WHISHWORKS.NET (aes256-cts-hmac-sha1-96)
   1 12/31/1969 19:00:00 HTTP/node.whishworks.net@WHISHWORKS.NET (aes128-cts-hmac-sha1-96)
[root@node ~]#

avatar

I see the difference in both keytab and principal with KVNO's

[root@ip-172-31-8-92 keytabs]# klist -kte spnego.service.keytab
Keytab name: FILE:spnego.service.keytab
KVNO Timestamp         Principal
---- ----------------- --------------------------------------------------------
   1 01/01/70 00:00:00 HTTP/ip-172-31-8-92.eu-west-1.compute.internal@WHISHWORKS.NET (des-cbc-crc)
   1 01/01/70 00:00:00 HTTP/ip-172-31-8-92.eu-west-1.compute.internal@WHISHWORKS.NET (des-cbc-md5)
   1 01/01/70 00:00:00 HTTP/ip-172-31-8-92.eu-west-1.compute.internal@WHISHWORKS.NET (arcfour-hmac)
   1 01/01/70 00:00:00 HTTP/ip-172-31-8-92.eu-west-1.compute.internal@WHISHWORKS.NET (aes256-cts-hmac-sha1-96)
   1 01/01/70 00:00:00 HTTP/ip-172-31-8-92.eu-west-1.compute.internal@WHISHWORKS.NET (aes128-cts-hmac-sha1-96)
[root@ip-172-31-8-92 keytabs]# kvno HTTP/ip-172-31-8-92.eu-west-1.compute.internal@WHISHWORKS.NET
HTTP/ip-172-31-8-92.eu-west-1.compute.internal@WHISHWORKS.NET: kvno = 2
[root@ip-172-31-8-92 keytabs]#

avatar
Master Mentor

@Raju

It looks like you might not have setup the FQDN properly for all your hosts. (Or the hostname might have changed)

Ambari associates the FQDN (hostname) in the principal name So ig you are not setting up your host FQDN properly then the keytabs might be generated with incorrect principals.

Please check if your Hosts have recently changed their hostname? Vefify the output of the following command in different hosts of your cluster including the problematic host.

# hostname -f
# /cat /etc/hosts

.

Once you fix the hostname, Please try to regenerate the Keytabs from Ambari UI --> Kerberos --> Regenerate Keytabs

NOTE: Regenerating Keytabs will require whole cluster restart, hence please find a maintenance window to do that.

https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.2/bk_ambari-operations/content/how_to_regener...

.



Hadoop relies heavily on DNS, and as such performs many DNS lookups during normal operation. All hosts in your system must be configured for both forward and and reverse DNS. If you are unable to configure DNS in this way, you should edit the /etc/hosts file on every host in your cluster to contain the IP address and Fully Qualified Domain Name of each of your hosts.
https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.2/bk_ambari-installation-ppc/content/check_dn...