Support Questions

Find answers, ask questions, and share your expertise

IPA Kerberos not liking my kinit Ticket

avatar

I've just installed and Kerberized my cluster:

Ambari 2.1.1

CentOS 7

IPA 4 for LDAP and Kerberos (IPA Clients Configured across the cluster hosts)

Oracle JDK 1.7.0_79 (with JCE)

HDP 2.3.0

The cluster comes up just fine and all the services seem to be happy talking to each other. So I'm pretty convinced that all the keytabs are configured correctly.

From any node on the cluster, after getting a valid ticket (kinit) and trying to do a basic hdfs command, I get (kerberos debug enabled): ONLY HAPPENS FROM IPA Clients. Other host client access works fine (read on).

-sh-4.2$ klist
Ticket cache: KEYRING:persistent:100035:krb_ccache_T7mkWNw
Default principal: dstreev@HDP.LOCAL


Valid starting       Expires              Service principal
10/02/2015 09:17:07  10/03/2015 09:17:04  krbtgt/HDP.LOCAL@HDP.LOCAL
-sh-4.2$ hdfs dfs -ls .
Java config name: null
Native config name: /etc/krb5.conf
Loaded from native config
>>>KinitOptions cache name is /tmp/krb5cc_100035
15/10/02 18:07:48 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
15/10/02 18:07:48 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
15/10/02 18:07:48 INFO retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB over m2.hdp.local/10.0.0.161:8020 after 1 fail over attempts. Trying to fail over immediately.
java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "m3.hdp.local/10.0.0.162"; destination host is: "m2.hdp.local":8020;
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
	at org.apache.hadoop.ipc.Client.call(Client.java:1431)
	at org.apache.hadoop.ipc.Client.call(Client.java:1358)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
	at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
	at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
	at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
	at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1655)
	at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)
	at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235)
	at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218)
	at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
	at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
	at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
	at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
	at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:685)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:648)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:735)
	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:373)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1493)
	at org.apache.hadoop.ipc.Client.call(Client.java:1397)
	... 28 more
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
	at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413)
	at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:558)
	at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:373)
	at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:727)
	at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:723)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:722)
	... 31 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
	at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
	at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:121)
	at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
	at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:223)
	at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
	at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:193)
	... 40 more
15/10/02 18:07:48 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

(it retries 15-20 times before quitting, which happens really fast.

If I try to access the cluster from a host that is NOT part of the IPA hosts (from my Mac, as an example). I do NOT get this error, and I can interact with the cluster.

➜  conf  klist
Credentials cache: API:D44F3F89-A095-40A5-AA7C-BD06698AA606
        Principal: dstreev@HDP.LOCAL
  Issued                Expires               Principal
Oct  2 17:52:13 2015  Oct  3 17:52:00 2015  krbtgt/HDP.LOCAL@HDP.LOCAL
Oct  2 18:06:53 2015  Oct  3 17:52:00 2015  host/m3.hdp.local@HDP.LOCAL
➜  conf  hdfs dfs -ls /
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
Java config name: null
Native config name: /etc/krb5.conf
Loaded from native config
15/10/02 18:10:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>>>KinitOptions cache name is /tmp/krb5cc_501
>> Acquire default native Credentials
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 18 17 16 23.
>>> Obtained TGT from LSA: Credentials:
      client=dstreev@HDP.LOCAL
      server=krbtgt/HDP.LOCAL@HDP.LOCAL
    authTime=20151002215213Z
   startTime=20151002215213Z
     endTime=20151003215200Z
   renewTill=20151009215200Z
       flags=FORWARDABLE;RENEWABLE;INITIAL;PRE-AUTHENT
EType (skey)=18
   (tkt key)=18
15/10/02 18:10:59 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
Found ticket for dstreev@HDP.LOCAL to go to krbtgt/HDP.LOCAL@HDP.LOCAL expiring on Sat Oct 03 17:52:00 EDT 2015
Entered Krb5Context.initSecContext with state=STATE_NEW
Found ticket for dstreev@HDP.LOCAL to go to krbtgt/HDP.LOCAL@HDP.LOCAL expiring on Sat Oct 03 17:52:00 EDT 2015
Service ticket not found in the subject
>>> Credentials acquireServiceCreds: same realm
Using builtin default etypes for default_tgs_enctypes
default etypes for default_tgs_enctypes: 18 17 16 23.
>>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KdcAccessibility: reset
>>> KrbKdcReq send: kdc=m3.hdp.local UDP:88, timeout=30000, number of retries =3, #bytes=654
>>> KDCCommunication: kdc=m3.hdp.local UDP:88, timeout=30000,Attempt =1, #bytes=654
>>> KrbKdcReq send: #bytes read=637
>>> KdcAccessibility: remove m3.hdp.local
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbApReq: APOptions are 00100000 00000000 00000000 00000000
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
Krb5Context setting mySeqNumber to: 227177742
Created InitSecContextToken:
0000: 01 00 6E 82 02 3C 30 82   02 38 A0 03 02 01 05 A1  ..n..<0..8......
0010: 03 02 01 0E A2 07 03 05   00 20 00 00 00 A3 82 01  ......... ......
...
0230: 99 AC EE FB DF 86 B5 2A   19 CB A1 0B 8A 8E F7 9B  .......*........
0240: 81 08                                              ..

Entered Krb5Context.initSecContext with state=STATE_IN_PROCESS
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
Krb5Context setting peerSeqNumber to: 40282898
Krb5Context.unwrap: token=[05 04 01 ff 00 0c 00 00 00 00 00 00 02 66 ab 12 01 01 00 00 8e 14 7a df 34 d7 c5 3d 5d d1 ce b5 ]
Krb5Context.unwrap: data=[01 01 00 00 ]
Krb5Context.wrap: data=[01 01 00 00 ]
Krb5Context.wrap: token=[05 04 00 ff 00 0c 00 00 00 00 00 00 0d 8a 75 0e 01 01 00 00 9c a5 73 25 59 0f b5 64 24 f0 a8 78 ]
Found 8 items
drwxrwxrwx   - yarn   hadoop          0 2015-09-28 15:55 /app-logs
drwxr-xr-x   - hdfs   hdfs            0 2015-09-28 15:57 /apps
drwxr-xr-x   - hdfs   hdfs            0 2015-09-28 15:53 /hdp
drwxr-xr-x   - mapred hdfs            0 2015-09-28 15:53 /mapred
drwxrwxrwx   - mapred hadoop          0 2015-09-28 15:54 /mr-history
drwxr-xr-x   - hdfs   hdfs            0 2015-09-28 19:20 /ranger
drwxrwxrwx   - hdfs   hdfs            0 2015-09-29 13:09 /tmp
drwxr-xr-x   - hdfs   hdfs            0 2015-10-02 17:51 /user
➜  conf

Since I can get to the cluster and interact with it, from a host that hasn't been configured by the IPA client. I'm pretty sure that my IPA environment is tweaked.

Any idea where to look in IPA to fix this for the hosts that are part of the IPA environment?

1 ACCEPTED SOLUTION

avatar

Check your /etc/krb5.conf file. If you see a line like

default_ccache_name = KEYRING:...

You should remove it from all krb5.conf files. This is causing the Kerberos libraries to store the Kerberos cache in an alternate location and the Hadoop libraries cant seem to access it. I will eventually research this issue more to see how we can get the Hadoop libraries access to Kerberos credential caches stored in keyrings, but for now the solution is to have the cache stored in the default location.

Once you remove this line from all krb5.conf files, restart all services and they should start up properly.

View solution in original post

5 REPLIES 5

avatar

Are the times in sync on both the machines? Maybe kdestroy and knit again and retry?

avatar

Check your /etc/krb5.conf file. If you see a line like

default_ccache_name = KEYRING:...

You should remove it from all krb5.conf files. This is causing the Kerberos libraries to store the Kerberos cache in an alternate location and the Hadoop libraries cant seem to access it. I will eventually research this issue more to see how we can get the Hadoop libraries access to Kerberos credential caches stored in keyrings, but for now the solution is to have the cache stored in the default location.

Once you remove this line from all krb5.conf files, restart all services and they should start up properly.

avatar
Explorer

all kbr5.conf, you mean on nodes accessing the cluster, or all nodes where ipa-client has been installed? (including cluster nodes then).

avatar

@Philippe Back... From all krb5.conf files on all nodes in the hadoop cluster.

avatar

I'm building up up a list of error messages: https://github.com/steveloughran/kerberos_and_hadoop/blob/master/sections/errors.md

One thing that can often come back on Oracle JDK is that the client doesn't have the java crypto extensions installed; this can be one of the obscure error messages raised in this situation. Do a java -version to check to see if it is the oracle jvm (not open jdk) and then install the JCE JAR if it is needed