Support Questions

Find answers, ask questions, and share your expertise

Is it possible to submit a spark job remotely to a kerberized hortonworks cluster (HDP 2.4.3)

avatar
Explorer

I've tried to submit a spark job to yarn remotely:

./spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --principal xxx --keytab /etc/security/keytabs/xxx.headless.keytab ../lib/spark-examples*.jar 100

But I get the following errors:

Exception in thread "main" java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:794) at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86) at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2046) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$obtainTokensForNamenodes$1.apply(YarnSparkHadoopUtil.scala:131) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$obtainTokensForNamenodes$1.apply(YarnSparkHadoopUtil.scala:128) at scala.collection.immutable.Set$Set1.foreach(Set.scala:74) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.obtainTokensForNamenodes(YarnSparkHadoopUtil.scala:128) at org.apache.spark.deploy.yarn.Client.getTokenRenewalInterval(Client.scala:593) at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:626) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:726) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.<init>(SparkContext.scala:530) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:306) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:196) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:127) at org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:216) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:284) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:165) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:348) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:786) ... 24 more Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator$1.run(KerberosAuthenticator.java:285) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator$1.run(KerberosAuthenticator.java:261) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:261) ... 32 more

Has anyone hit this issue and is there a fix or workaround?

9 REPLIES 9

avatar
Super Guru
@Kevin Ng

Can you run a kinit before running the spark command?

avatar
Explorer

Yes kinit is run before hand

avatar
Guru

@Kevin Ng, Follow below steps. Suppose you want to run this application as user "a".

sudo su a
kinit -kt <a keytab> <a principal>
./spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client  ../lib/spark-examples*.jar 100

avatar
Super Guru

@yvora

actually if you want to run this as user "a" and not the principal then command changes...you do kinit like you said but then you provide --proxy-user

avatar
Guru

@Kevin Ng, if you are doing kinit properly, then it can be a configuration issue related to Ranger KMS. Make sure that KMS is configured properly in your cluster. Refer to to below thread for configurations.

https://community.hortonworks.com/questions/28052/exception-while-executing-insert-query-on-kerberos...

avatar
Explorer

@yvoraThanks for your suggestions but still no luck.

I did a kinit:

Logged in user is kng

kinit -kt kng.headless.keytab kng

I added:

hadoop.kms.proxyuser.kng.users = *

hadoop.kms.proxyuser.kng.hosts = *

Still same error.

I think where it falls over is at this point:

org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence

Looking at the source code, it seems to be trying to get some delegation token using HTTP. I'm not sure what the code is trying to do here.

avatar
Guru

@Kevin Ng, can you please check the cluster configuration for Spnego authentication? Find the guidelines as below.

https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.0/bk_Ambari_Security_Guide/content/ch_enable_...

avatar
Explorer

@yvora

I made a little progress on this today.

The first error was solved by installing JCE. So it was a kinit error of sorts because java without the JCE could not read the kerberos token in the cache.

============================================================================================

Now I get another set of errors:

In the resource manager, when I submit the spark that works (from within the cluster) I see the line:

2017-04-05 10:56:29,578 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1491001893949_0381_000001 State change from ALLOCATED to LAUNCHED 2017-04-05 10:56:30,554 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(420)) - container_e34_1491001893949_0381_01_000001 Container Transitioned from ACQUIRED to RUNNING 2017-04-05 10:56:34,963 INFO ipc.Server (Server.java:saslProcess(1538)) - Auth successful for appattempt_1491001893949_0381_000001 (auth:SIMPLE) 2017-04-05 10:56:34,970 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for appattempt_1491001893949_0381_000001 (auth:TOKEN) for protocol=interface org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB 2017-04-05 10:56:34,975 INFO resourcemanager.ApplicationMasterService

(ApplicationMasterService.java:registerApplicationMaster(280)) - AM registration appattempt_1491001893949_0381_000001

=============================================================================================

When I submit the job from outside the cluster, I do not get the lines. I get this instead:

2017-04-05 12:33:04,777 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1491001893949_0389_000001 State change from ALLOCATED to LAUNCHED 2017-04-05 12:33:05,751 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(420)) - container_e34_1491001893949_0389_01_000001 Container Transitioned from ACQUIRED to RUNNING 2017-04-05 12:33:07,140 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(420)) - container_e34_1491001893949_0389_01_000001 Container Transitioned from RUNNING to COMPLETED 2017-04-05 12:33:07,140 INFO resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(141)) - USER=kng OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1491001893949_0389 CONTAINERID=container_e34_1491001893949_0389_01_000001

=============================================================================================

The spark-submit console output gives the following stack trace:

Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:720) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:683) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:770) at org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1618) at org.apache.hadoop.ipc.Client.call(Client.java:1449) at org.apache.hadoop.ipc.Client.call(Client.java:1396) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:816) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176) at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2158) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1423) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1419) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1419) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:172) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:595) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:397) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:762) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:758) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:757) ... 35 more

Any ideas?

avatar
Explorer

Some more info on this:

The node manage logs shows that it is trying to enable log aggregation and download resources uploaded when the spark job was submitted.

I believe that the node manager does not have a valid HDFS delegation token. How is the token transmitted from the spark submit job to the node managers?

Thanks