Created 07-11-2018 01:26 AM
Trying to enable kerberos in my cluster and to use 3rd option from ambari wizard (manual method to distributing keytabs)
I have created principals and keytabs in AD and distributed to hadoop server but when I start service it is throwing below error.
[root@node keytabs]# kinit -kt nn.service.keytab nn/node.whishworks.net@WHISHWORKS.NET
[root@node keytabs]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: nn/node.whishworks.net@WHISHWORKS.NET
Valid starting Expires Service principal 07/10/2018 21:16:38 07/11/2018 07:16:38 krbtgt/WHISHWORKS.NET@WHISHWORKS.NET renew until 07/17/2018 21:16:38 [root@node keytabs]#
2018-07-10 20:47:49,091 ERROR namenode.NameNode (NameNode.java:main(1783)) - Failed to start namenode. java.io.IOException: Login failure for nn/node.whishworks.net@WHISHWORKS.NET from keytab /etc/security/keytabs/nn.service.keytab: javax.security.auth.login.LoginException: Cannot locate KDC at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1098) at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:307) at org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(NameNode.java:726) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:745) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:1001) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:985) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1710) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1778) Caused by: javax.security.auth.login.LoginException: Cannot locate KDC at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at javax.security.auth.login.LoginContext.login(LoginContext.java:587) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1089) ... 7 more Caused by: KrbException: Cannot locate KDC at sun.security.krb5.Config.getKDCList(Config.java:1084) at sun.security.krb5.KdcComm.send(KdcComm.java:218) at sun.security.krb5.KdcComm.send(KdcComm.java:200) at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:316) at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:361) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java
Created 07-11-2018 02:28 AM
This error states that NN is unable to determine the KDC server for kerberos authentication. Can you please validate your /etc/krb5.conf and that has got the realms pointing to your AD server?
Created 07-11-2018 11:37 AM
NN show started in Ambari but still it is showing below error in NN logs and also when I run hdfs dfs -ls / with valid kerberos ticket.
[root@node keytabs]# klist Ticket cache: FILE:/tmp/krb5cc_0 Default principal: nn/node.whishworks.net@WHISHWORKS.NET Valid starting Expires Service principal 07/11/2018 07:21:04 07/11/2018 17:21:04 krbtgt/WHISHWORKS.NET@WHISHWORKS.NET renew until 07/18/2018 07:21:04 [root@node keytabs]#
18/07/11 07:00:09 WARN ipc.Client: Couldn't setup connection for nn/node.whishworks.net@WHISHWORKS.NET to node.whishworks.net/172.31.50.76:8020 javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7))] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:414) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:595) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:397) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:762) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:758) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:758) at org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1620) at org.apache.hadoop.ipc.Client.call(Client.java:1451) at org.apache.hadoop.ipc.Client.call(Client.java:1398) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:823) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:290) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:202) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:184) at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2177) at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1442) at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1438) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1454) at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) at org.apache.hadoop.fs.Globber.glob(Globber.java:265) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1697) at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326) at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235) at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218) at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:103) at org.apache.hadoop.fs.shell.Command.run(Command.java:165) at org.apache.hadoop.fs.FsShell.run(FsShell.java:297) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.fs.FsShell.main(FsShell.java:356) Caused by: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7)) at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:770) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192) ... 41 more Caused by: KrbException: Server not found in Kerberos database (7) at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:70) at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251) at sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262) at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308) at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126) at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458) at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:693) ... 44 more Caused by: KrbException: Identifier doesn't match expected value (906) at sun.security.krb5.internal.KDCRep.init(KDCRep.java:140) at sun.security.krb5.internal.TGSRep.init(TGSRep.java:65) at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60) at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55) ... 50 more ls: Failed on local exception: java.io.IOException: Couldn't setup connection for nn/node.whishworks.net@WHISHWORKS.NET to node.whishworks.net/172.31.50.76:8020; Host Details : local host is: "node.whishworks.net/172.31.50.76"; destination host is: "node.whishworks.net":8020;
Created 07-11-2018 11:47 AM
This time the error is different
GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7))]
The exception means there is a mismatch between the key stored in the keytab and the key in the AD, please check the encryption type and also the Key Version Number (kvno)
Created 07-11-2018 01:40 PM
Created 07-11-2018 01:41 PM
[root@node ~]# klist -kte /etc/security/keytabs/spnego.service.keytab Keytab name: FILE:/etc/security/keytabs/spnego.service.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 1 12/31/1969 19:00:00 HTTP/node.whishworks.net@WHISHWORKS.NET (des-cbc-crc) 1 12/31/1969 19:00:00 HTTP/node.whishworks.net@WHISHWORKS.NET (des-cbc-md5) 1 12/31/1969 19:00:00 HTTP/node.whishworks.net@WHISHWORKS.NET (arcfour-hmac) 1 12/31/1969 19:00:00 HTTP/node.whishworks.net@WHISHWORKS.NET (aes256-cts-hmac-sha1-96) 1 12/31/1969 19:00:00 HTTP/node.whishworks.net@WHISHWORKS.NET (aes128-cts-hmac-sha1-96) [root@node ~]#
Created 07-12-2018 11:31 AM
I see the difference in both keytab and principal with KVNO's
[root@ip-172-31-8-92 keytabs]# klist -kte spnego.service.keytab Keytab name: FILE:spnego.service.keytab KVNO Timestamp Principal ---- ----------------- -------------------------------------------------------- 1 01/01/70 00:00:00 HTTP/ip-172-31-8-92.eu-west-1.compute.internal@WHISHWORKS.NET (des-cbc-crc) 1 01/01/70 00:00:00 HTTP/ip-172-31-8-92.eu-west-1.compute.internal@WHISHWORKS.NET (des-cbc-md5) 1 01/01/70 00:00:00 HTTP/ip-172-31-8-92.eu-west-1.compute.internal@WHISHWORKS.NET (arcfour-hmac) 1 01/01/70 00:00:00 HTTP/ip-172-31-8-92.eu-west-1.compute.internal@WHISHWORKS.NET (aes256-cts-hmac-sha1-96) 1 01/01/70 00:00:00 HTTP/ip-172-31-8-92.eu-west-1.compute.internal@WHISHWORKS.NET (aes128-cts-hmac-sha1-96) [root@ip-172-31-8-92 keytabs]# kvno HTTP/ip-172-31-8-92.eu-west-1.compute.internal@WHISHWORKS.NET HTTP/ip-172-31-8-92.eu-west-1.compute.internal@WHISHWORKS.NET: kvno = 2 [root@ip-172-31-8-92 keytabs]#
Created 07-12-2018 11:39 AM
It looks like you might not have setup the FQDN properly for all your hosts. (Or the hostname might have changed)
Ambari associates the FQDN (hostname) in the principal name So ig you are not setting up your host FQDN properly then the keytabs might be generated with incorrect principals.
Please check if your Hosts have recently changed their hostname? Vefify the output of the following command in different hosts of your cluster including the problematic host.
# hostname -f # /cat /etc/hosts
.
Once you fix the hostname, Please try to regenerate the Keytabs from Ambari UI --> Kerberos --> Regenerate Keytabs
NOTE: Regenerating Keytabs will require whole cluster restart, hence please find a maintenance window to do that.
.
Hadoop relies heavily on DNS, and as such performs many DNS lookups during normal operation. All hosts in your system must be configured for both forward and and reverse DNS. If you are unable to configure DNS in this way, you should edit the /etc/hosts file on every host in your cluster to contain the IP address and Fully Qualified Domain Name of each of your hosts.
https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.2/bk_ambari-installation-ppc/content/check_dn...