Created on 10-02-2015 10:15 PM - edited 09-16-2022 02:42 AM
I've just installed and Kerberized my cluster:
Ambari 2.1.1
CentOS 7
IPA 4 for LDAP and Kerberos (IPA Clients Configured across the cluster hosts)
Oracle JDK 1.7.0_79 (with JCE)
HDP 2.3.0
The cluster comes up just fine and all the services seem to be happy talking to each other. So I'm pretty convinced that all the keytabs are configured correctly.
From any node on the cluster, after getting a valid ticket (kinit) and trying to do a basic hdfs command, I get (kerberos debug enabled): ONLY HAPPENS FROM IPA Clients. Other host client access works fine (read on).
-sh-4.2$ klist Ticket cache: KEYRING:persistent:100035:krb_ccache_T7mkWNw Default principal: dstreev@HDP.LOCAL Valid starting Expires Service principal 10/02/2015 09:17:07 10/03/2015 09:17:04 krbtgt/HDP.LOCAL@HDP.LOCAL -sh-4.2$ hdfs dfs -ls . Java config name: null Native config name: /etc/krb5.conf Loaded from native config >>>KinitOptions cache name is /tmp/krb5cc_100035 15/10/02 18:07:48 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 15/10/02 18:07:48 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 15/10/02 18:07:48 INFO retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB over m2.hdp.local/10.0.0.161:8020 after 1 fail over attempts. Trying to fail over immediately. java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "m3.hdp.local/10.0.0.162"; destination host is: "m2.hdp.local":8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773) at org.apache.hadoop.ipc.Client.call(Client.java:1431) at org.apache.hadoop.ipc.Client.call(Client.java:1358) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301) at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) at org.apache.hadoop.fs.Globber.glob(Globber.java:252) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1655) at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326) at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235) at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201) at org.apache.hadoop.fs.shell.Command.run(Command.java:165) at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:685) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:648) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:735) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:373) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1493) at org.apache.hadoop.ipc.Client.call(Client.java:1397) ... 28 more Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:558) at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:373) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:727) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:723) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:722) ... 31 more Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:121) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:223) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:193) ... 40 more 15/10/02 18:07:48 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
(it retries 15-20 times before quitting, which happens really fast.
If I try to access the cluster from a host that is NOT part of the IPA hosts (from my Mac, as an example). I do NOT get this error, and I can interact with the cluster.
➜ conf klist Credentials cache: API:D44F3F89-A095-40A5-AA7C-BD06698AA606 Principal: dstreev@HDP.LOCAL Issued Expires Principal Oct 2 17:52:13 2015 Oct 3 17:52:00 2015 krbtgt/HDP.LOCAL@HDP.LOCAL Oct 2 18:06:53 2015 Oct 3 17:52:00 2015 host/m3.hdp.local@HDP.LOCAL ➜ conf hdfs dfs -ls / Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0 Java config name: null Native config name: /etc/krb5.conf Loaded from native config 15/10/02 18:10:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable >>>KinitOptions cache name is /tmp/krb5cc_501 >> Acquire default native Credentials Using builtin default etypes for default_tkt_enctypes default etypes for default_tkt_enctypes: 18 17 16 23. >>> Obtained TGT from LSA: Credentials: client=dstreev@HDP.LOCAL server=krbtgt/HDP.LOCAL@HDP.LOCAL authTime=20151002215213Z startTime=20151002215213Z endTime=20151003215200Z renewTill=20151009215200Z flags=FORWARDABLE;RENEWABLE;INITIAL;PRE-AUTHENT EType (skey)=18 (tkt key)=18 15/10/02 18:10:59 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. Found ticket for dstreev@HDP.LOCAL to go to krbtgt/HDP.LOCAL@HDP.LOCAL expiring on Sat Oct 03 17:52:00 EDT 2015 Entered Krb5Context.initSecContext with state=STATE_NEW Found ticket for dstreev@HDP.LOCAL to go to krbtgt/HDP.LOCAL@HDP.LOCAL expiring on Sat Oct 03 17:52:00 EDT 2015 Service ticket not found in the subject >>> Credentials acquireServiceCreds: same realm Using builtin default etypes for default_tgs_enctypes default etypes for default_tgs_enctypes: 18 17 16 23. >>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType >>> KdcAccessibility: reset >>> KrbKdcReq send: kdc=m3.hdp.local UDP:88, timeout=30000, number of retries =3, #bytes=654 >>> KDCCommunication: kdc=m3.hdp.local UDP:88, timeout=30000,Attempt =1, #bytes=654 >>> KrbKdcReq send: #bytes read=637 >>> KdcAccessibility: remove m3.hdp.local >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType >>> KrbApReq: APOptions are 00100000 00000000 00000000 00000000 >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType Krb5Context setting mySeqNumber to: 227177742 Created InitSecContextToken: 0000: 01 00 6E 82 02 3C 30 82 02 38 A0 03 02 01 05 A1 ..n..<0..8...... 0010: 03 02 01 0E A2 07 03 05 00 20 00 00 00 A3 82 01 ......... ...... ... 0230: 99 AC EE FB DF 86 B5 2A 19 CB A1 0B 8A 8E F7 9B .......*........ 0240: 81 08 .. Entered Krb5Context.initSecContext with state=STATE_IN_PROCESS >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType Krb5Context setting peerSeqNumber to: 40282898 Krb5Context.unwrap: token=[05 04 01 ff 00 0c 00 00 00 00 00 00 02 66 ab 12 01 01 00 00 8e 14 7a df 34 d7 c5 3d 5d d1 ce b5 ] Krb5Context.unwrap: data=[01 01 00 00 ] Krb5Context.wrap: data=[01 01 00 00 ] Krb5Context.wrap: token=[05 04 00 ff 00 0c 00 00 00 00 00 00 0d 8a 75 0e 01 01 00 00 9c a5 73 25 59 0f b5 64 24 f0 a8 78 ] Found 8 items drwxrwxrwx - yarn hadoop 0 2015-09-28 15:55 /app-logs drwxr-xr-x - hdfs hdfs 0 2015-09-28 15:57 /apps drwxr-xr-x - hdfs hdfs 0 2015-09-28 15:53 /hdp drwxr-xr-x - mapred hdfs 0 2015-09-28 15:53 /mapred drwxrwxrwx - mapred hadoop 0 2015-09-28 15:54 /mr-history drwxr-xr-x - hdfs hdfs 0 2015-09-28 19:20 /ranger drwxrwxrwx - hdfs hdfs 0 2015-09-29 13:09 /tmp drwxr-xr-x - hdfs hdfs 0 2015-10-02 17:51 /user ➜ conf
Since I can get to the cluster and interact with it, from a host that hasn't been configured by the IPA client. I'm pretty sure that my IPA environment is tweaked.
Any idea where to look in IPA to fix this for the hosts that are part of the IPA environment?
Created 10-02-2015 11:55 PM
Check your /etc/krb5.conf file. If you see a line like
default_ccache_name = KEYRING:...
You should remove it from all krb5.conf files. This is causing the Kerberos libraries to store the Kerberos cache in an alternate location and the Hadoop libraries cant seem to access it. I will eventually research this issue more to see how we can get the Hadoop libraries access to Kerberos credential caches stored in keyrings, but for now the solution is to have the cache stored in the default location.
Once you remove this line from all krb5.conf files, restart all services and they should start up properly.
Created 10-02-2015 10:26 PM
Are the times in sync on both the machines? Maybe kdestroy and knit again and retry?
Created 10-02-2015 11:55 PM
Check your /etc/krb5.conf file. If you see a line like
default_ccache_name = KEYRING:...
You should remove it from all krb5.conf files. This is causing the Kerberos libraries to store the Kerberos cache in an alternate location and the Hadoop libraries cant seem to access it. I will eventually research this issue more to see how we can get the Hadoop libraries access to Kerberos credential caches stored in keyrings, but for now the solution is to have the cache stored in the default location.
Once you remove this line from all krb5.conf files, restart all services and they should start up properly.
Created 06-16-2016 09:55 PM
all kbr5.conf, you mean on nodes accessing the cluster, or all nodes where ipa-client has been installed? (including cluster nodes then).
Created 06-17-2016 01:15 PM
@Philippe Back... From all krb5.conf files on all nodes in the hadoop cluster.
Created 10-05-2015 09:39 AM
I'm building up up a list of error messages: https://github.com/steveloughran/kerberos_and_hadoop/blob/master/sections/errors.md
One thing that can often come back on Oracle JDK is that the client doesn't have the java crypto extensions installed; this can be one of the obscure error messages raised in this situation. Do a java -version to check to see if it is the oracle jvm (not open jdk) and then install the JCE JAR if it is needed