Support Questions

Find answers, ask questions, and share your expertise

ORC - GSS Exception : Failed to find any Kerberos tgt

avatar
New Contributor

We have a Java application which launches spark and mapreduce jobs in both local mode(setting spark.master=local and mapreduce.framework.name=local) and distributed mode. For certain use cases we also call org.apache.hadoop.mapred.InputFormat class getSplits method explicitly based on the source data.For eg. for Hcatalog source we call HCatInputFormat's getSplits method. This works fine but intermittently we face GSS exception in calling getSplits method on Kerberos enabled cluster.

We have not been able to identify the scenario when it occurs but few observations are

  • It occurs only when hcatalog table source data format is ORC
  • It occurs mostly when the application is up and running for more than 3 days.

We have added few handling in the code to fix this

  • We have also relogin-ed the user by calling org.apache.hadoop.security.UserGroupInformation reloginFromKeytab method to deal with expired tickets.
  • Also we are creating new UGI before launching any job by following code
UserGroupInformation kerberosUGI = UserGroupInformation
				.loginUserFromKeytabAndReturnUGI(Principalname, keytabPath);
				
and launching job 


kerberosUGI.doAs(new PrivilegedExceptionAction<void>()
{
	launch job
	
}

The exception stacktrace is

Tue, 16 Oct 2018 15:31:41,102 IST [dag-scheduler-event-loop]:ERROR:1184:ERROR:java.lang.RuntimeException: serious problem
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
        at org.apache.hive.hcatalog.mapreduce.HCatBaseInputFormat.getSplits(HCatBaseInputFormat.java:152)
        ...
        at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:120)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
		.....
		Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "intelli-i0048.kyvostest.com/172.26.43.28"; destination host is: "intelli-i0044.kyvostest.com":8020;
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998)
        ... 49 more
Caused by: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "intelli-i0048.kyvostest.com/172.26.43.28"; destination host is: "intelli-i0044.kyvostest.com":8020;
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:782)
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1558)
        at org.apache.hadoop.ipc.Client.call(Client.java:1498)
        at org.apache.hadoop.ipc.Client.call(Client.java:1398)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at com.sun.proxy.$Proxy17.getListing(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:625)
        at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
        at com.sun.proxy.$Proxy18.getListing(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143)
        at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1076)
        at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1059)
        at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1004)
        at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1000)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1000)
        at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1735)
        at org.apache.hadoop.hive.shims.Hadoop23Shims.listLocatedStatus(Hadoop23Shims.java:667)
        at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:361)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:634)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:620)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
        at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:720)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
        at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:683)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:770)
        at org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1620)
        at org.apache.hadoop.ipc.Client.call(Client.java:1451)
        ... 27 more
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
        at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413)
        at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:595)
        at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:397)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:762)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:758)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:757)
        ... 30 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
        at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
        at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
        at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
        at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
        ... 39 more

One more

DEBUG:java.lang.RuntimeException: serious problem
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
	at org.apache.hive.hcatalog.mapreduce.HCatBaseInputFormat.getSplits(HCatBaseInputFormat.java:152)
	....
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "test-kyvos1.mhadev.local/10.210.89.250"; destination host is: "master2.mhadev.local":8020; 
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998)
	... 22 more
Caused by: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "test-kyvos1.mhadev.local/10.210.89.250"; destination host is: "master2.mhadev.local":8020; 
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:785)
	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1558)
	at org.apache.hadoop.ipc.Client.call(Client.java:1498)
	at org.apache.hadoop.ipc.Client.call(Client.java:1398)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
	at com.sun.proxy.$Proxy15.getListing(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:625)
	at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
	at com.sun.proxy.$Proxy16.getListing(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143)
	at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1076)
	at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1059)
	at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1004)
	at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1000)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1018)
	at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1735)
	at org.apache.hadoop.hive.shims.Hadoop23Shims.listLocatedStatus(Hadoop23Shims.java:667)
	at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:361)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:634)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:620)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	... 1 more
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
	at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:720)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
	at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:683)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:770)
	at org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1620)
	at org.apache.hadoop.ipc.Client.call(Client.java:1451)
	... 27 more
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
	at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:414)
	at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:595)
	at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:397)
	at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:762)
	at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:758)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:758)
	... 30 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
	at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
	at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
	at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
	at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
	at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
	at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
	... 39 more
1 REPLY 1

avatar

@Saurabh Gupta

The stack traces you provided are a good start, but give little in the way of why the Kerberos ticket is not available. There could be several reasons for this. Maybe the ticket cache was expired or clobbered by some other process? Can you turn on Kerberos debugging and post the logs from that?

To enable Kerberos debugging in Java, set the system property "sun.security.krb5.debug" to true. If doing this on the command line, use

-Dsun.security.krb5.debug=true

Or in code, use

System.setProperty("sun.security.krb5.debug", "true");

Kerberos debugging information should be posted to STDOUT.