Support Questions

Find answers, ask questions, and share your expertise

Who agreed with this topic

Spark-Submit and Ozone FS :: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]

avatar
Explorer

Hi Team,

 

Looking for some inputs on the below error. Basically we have a sample Spark Java application which reads an input file that is existing in HDFS, perform transformations on the data and rewrites it back to another file into HDFS. On spark-submit like below :

 

  • spark-submit --master yarn --deploy-mode cluster --class com.sample.SparkJavaApiTest /tmp/sample-spark-java.jar <HDFS File Path : /user/user1/test.txt> <HDFS Output File : /user/user1/output.txt>

The sample code works perfectly. However when the same job is submitted via spark-submit with files in Ozone File system getting the below error :

 

  • spark-submit --master yarn --deploy-mode cluster --keytab /tmp/test.keytab --principal user1@EXAMPLE.COM --class com.sample.SparkJavaApiTest /tmp/sample-spark-java.jar 'ofs://sk-ozone-test1/user1vol/user1bucket/test.txt' 'ofs://sk-ozone-test1/user1vol/user1bucket/output'

or

  • spark-submit --master yarn --deploy-mode cluster --keytab /tmp/test.keytab --principal user1@EXAMPLE.COM --class com.sample.SparkJavaApiTest /tmp/sample-spark-java.jar 'o3fs://user1bucket.user1vol.master.localdomain.com/test.txt' 'o3fs://user1bucket.user1vol.master.localdomain.com/output'

 

Fails with exception :

 

23/08/17 06:21:12 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on node2.localdomain.com:44558 (size: 48.0 KB, free: 366.3 MB)
23/08/17 06:21:13 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, node3.localdomain.com, executor 1): java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:789)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:752)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:847)
at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1662)
at org.apache.hadoop.ipc.Client.call(Client.java:1487)
at org.apache.hadoop.ipc.Client.call(Client.java:1440)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy25.submitRequest(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
at com.sun.proxy.$Proxy25.submitRequest(Unknown Source)
at org.apache.hadoop.ozone.om.protocolPB.Hadoop3OmTransport.submitRequest(Hadoop3OmTransport.java:80)
at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.submitRequest(OzoneManagerProtocolClientSideTranslatorPB.java:284)
at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.getServiceInfo(OzoneManagerProtocolClientSideTranslatorPB.java:1442)
at org.apache.hadoop.ozone.client.rpc.RpcClient.<init>(RpcClient.java:236)
at org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:247)
at org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:114)
at org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.<init>(BasicRootedOzoneClientAdapterImpl.java:181)
at org.apache.hadoop.fs.ozone.RootedOzoneClientAdapterImpl.<init>(RootedOzoneClientAdapterImpl.java:51)
at org.apache.hadoop.fs.ozone.RootedOzoneFileSystem.createAdapter(RootedOzoneFileSystem.java:92)
at org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.initialize(BasicRootedOzoneFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3451)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:161)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3556)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3503)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:521)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:111)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:267)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:266)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:224)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:95)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$12.apply(Executor.scala:456)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1334)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:462)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:173)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:390)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:623)
at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:414)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:834)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:830)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:830)
... 52 more

 

CDP PVC Base cluster has Kerberos enabled. Wanted to know if I am missing out on any configuration on Spark or YARN that is causing the above error. The console log seems to indicate it is able to obtain delegation token for user "user1" correctly for HDFS :

 

23/08/17 07:27:41 INFO security.HadoopFSDelegationTokenProvider: getting token for: class org.apache.hadoop.hdfs.DistributedFileSystem:hdfs://master.localdomain.com:8020 with renewer yarn/master.localdomain.com@EXAMPLE.COM
23/08/17 07:27:41 INFO hdfs.DFSClient: Created token for user1: HDFS_DELEGATION_TOKEN owner=user1@EXAMPLE.COM, renewer=yarn, realUser=, issueDate=1692271661218, maxDate=1692876461218, sequenceNumber=545, masterKeyId=126 on 10.49.0.9:8020
23/08/17 07:27:41 INFO security.HadoopFSDelegationTokenProvider: getting token for: class org.apache.hadoop.hdfs.DistributedFileSystem:hdfs://master.localdomain.com:8020 with renewer user1@EXAMPLE.COM
23/08/17 07:27:41 INFO hdfs.DFSClient: Created token for user1: HDFS_DELEGATION_TOKEN owner=user1@EXAMPLE.COM, renewer=user1, realUser=, issueDate=1692271661245, maxDate=1692876461245, sequenceNumber=546, masterKeyId=126 on 10.49.0.9:8020
23/08/17 07:27:41 INFO security.HadoopFSDelegationTokenProvider: Renewal interval is 86400067 for token HDFS_DELEGATION_TOKEN
23/08/17 07:27:42 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.5.5-801-b1e2c346541b2d00405d023dc5c4894d038aef98, built on 08/24/2022 12:46 GMT
23/08/17 07:27:42 INFO zookeeper.ZooKeeper: Client environment:host.name=master.localdomain.com

 

But the same does not seems to succeed inside spark job for Ozone FS file ?

 

Who agreed with this topic