Support Questions

ngkmh · ‎03-31-2017

I've tried to submit a spark job to yarn remotely:

./spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --principal xxx --keytab /etc/security/keytabs/xxx.headless.keytab ../lib/spark-examples*.jar 100

But I get the following errors:

Exception in thread "main" java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:794) at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86) at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2046) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$obtainTokensForNamenodes$1.apply(YarnSparkHadoopUtil.scala:131) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$obtainTokensForNamenodes$1.apply(YarnSparkHadoopUtil.scala:128) at scala.collection.immutable.Set$Set1.foreach(Set.scala:74) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.obtainTokensForNamenodes(YarnSparkHadoopUtil.scala:128) at org.apache.spark.deploy.yarn.Client.getTokenRenewalInterval(Client.scala:593) at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:626) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:726) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.<init>(SparkContext.scala:530) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:306) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:196) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:127) at org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:216) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:284) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:165) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:348) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:786) ... 24 more Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator$1.run(KerberosAuthenticator.java:285) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator$1.run(KerberosAuthenticator.java:261) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:261) ... 32 more

Has anyone hit this issue and is there a fix or workaround?

mqureshi · ‎03-31-2017

@Kevin Ng

Can you run a kinit before running the spark command?

ngkmh · ‎03-31-2017

Yes kinit is run before hand

yvora · ‎03-31-2017

@Kevin Ng, Follow below steps. Suppose you want to run this application as user "a".

sudo su a
kinit -kt <a keytab> <a principal>
./spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client  ../lib/spark-examples*.jar 100

mqureshi · ‎03-31-2017

@yvora

actually if you want to run this as user "a" and not the principal then command changes...you do kinit like you said but then you provide --proxy-user

yvora · ‎03-31-2017

@Kevin Ng, if you are doing kinit properly, then it can be a configuration issue related to Ranger KMS. Make sure that KMS is configured properly in your cluster. Refer to to below thread for configurations.

https://community.hortonworks.com/questions/28052/exception-while-executing-insert-query-on-kerberos...

ngkmh · ‎03-31-2017

@yvoraThanks for your suggestions but still no luck.

I did a kinit:

Logged in user is kng

kinit -kt kng.headless.keytab kng

I added:

hadoop.kms.proxyuser.kng.users = *

hadoop.kms.proxyuser.kng.hosts = *

Still same error.

I think where it falls over is at this point:

org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence

Looking at the source code, it seems to be trying to get some delegation token using HTTP. I'm not sure what the code is trying to do here.

yvora · ‎03-31-2017

@Kevin Ng, can you please check the cluster configuration for Spnego authentication? Find the guidelines as below.

https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.0/bk_Ambari_Security_Guide/content/ch_enable_...

ngkmh · ‎04-05-2017

@yvora

I made a little progress on this today.

The first error was solved by installing JCE. So it was a kinit error of sorts because java without the JCE could not read the kerberos token in the cache.

============================================================================================

Now I get another set of errors:

In the resource manager, when I submit the spark that works (from within the cluster) I see the line:

2017-04-05 10:56:29,578 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1491001893949_0381_000001 State change from ALLOCATED to LAUNCHED 2017-04-05 10:56:30,554 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(420)) - container_e34_1491001893949_0381_01_000001 Container Transitioned from ACQUIRED to RUNNING 2017-04-05 10:56:34,963 INFO ipc.Server (Server.java:saslProcess(1538)) - Auth successful for appattempt_1491001893949_0381_000001 (auth:SIMPLE) 2017-04-05 10:56:34,970 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for appattempt_1491001893949_0381_000001 (auth:TOKEN) for protocol=interface org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB 2017-04-05 10:56:34,975 INFO resourcemanager.ApplicationMasterService

(ApplicationMasterService.java:registerApplicationMaster(280)) - AM registration appattempt_1491001893949_0381_000001

=============================================================================================

When I submit the job from outside the cluster, I do not get the lines. I get this instead:

2017-04-05 12:33:04,777 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1491001893949_0389_000001 State change from ALLOCATED to LAUNCHED 2017-04-05 12:33:05,751 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(420)) - container_e34_1491001893949_0389_01_000001 Container Transitioned from ACQUIRED to RUNNING 2017-04-05 12:33:07,140 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(420)) - container_e34_1491001893949_0389_01_000001 Container Transitioned from RUNNING to COMPLETED 2017-04-05 12:33:07,140 INFO resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(141)) - USER=kng OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1491001893949_0389 CONTAINERID=container_e34_1491001893949_0389_01_000001

=============================================================================================

The spark-submit console output gives the following stack trace:

Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:720) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:683) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:770) at org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1618) at org.apache.hadoop.ipc.Client.call(Client.java:1449) at org.apache.hadoop.ipc.Client.call(Client.java:1396) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:816) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176) at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2158) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1423) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1419) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1419) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:172) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:595) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:397) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:762) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:758) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:757) ... 35 more

Any ideas?

ngkmh · ‎04-06-2017

Some more info on this:

The node manage logs shows that it is trying to enable log aggregation and download resources uploaded when the spark job was submitted.

I believe that the node manager does not have a valid HDFS delegation token. How is the token transmitted from the spark submit job to the node managers?

Thanks

Cloudera Community

Support Questions

Is it possible to submit a spark job remotely to a kerberized hortonworks cluster (HDP 2.4.3)