About skommineni

skommineni · ‎08-23-2023

@DianaTorres - Sure will do..thanks.

skommineni · ‎08-23-2023

@sam_kalmesh - Were you able to get a resolution for the error posted. We are hitting similar access error when trying to create a table from hive shell using s3a path i.e (s3a://bucket1/test_data_table) : Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403 Caused by: java.nio.file.AccessDeniedException: s3a://bucket1/test_data_table: getFileStatus on s3a://bucket1/test_data_table If the external path is changed to o3fs style to the same bucket then the tables get successfully created but fails with s3a path style. The following config settings were added to "Hive Service Advanced Configuration Snippet (Safety Valve) for core-site.xml" as well : fs.s3a.bucket.bucket1.access.key fs.s3a.impl fs.s3a.bucket.bucket1.secret.key fs.s3a.endpoint fs.s3a.bucket.probe fs.s3a.change.detection.version.required fs.s3a.path.style.access fs.s3a.change.detection.mode

skommineni · ‎08-21-2023

Hi @RangaReddy Thanks for sharing the link. The link does not include details of spark jobs submitted via spark-submit but covers samples of reading files from Ozone using spark-shell. Hence have to post the error for clarification. However the above conf parameter helped in resolving the error observed in spark job.

skommineni · ‎08-17-2023

Ignore the above post got the above error resolved by passing the following config value to spark-submit command : --conf=spark.yarn.access.hadoopFileSystems=o3fs://<bucket>.<volume>.<ozone-id>

skommineni · ‎08-17-2023

Hi Team, Looking for some inputs on the below error. Basically we have a sample Spark Java application which reads an input file that is existing in HDFS, perform transformations on the data and rewrites it back to another file into HDFS. On spark-submit like below : spark-submit --master yarn --deploy-mode cluster --class com.sample.SparkJavaApiTest /tmp/sample-spark-java.jar <HDFS File Path : /user/user1/test.txt> <HDFS Output File : /user/user1/output.txt> The sample code works perfectly. However when the same job is submitted via spark-submit with files in Ozone File system getting the below error : spark-submit --master yarn --deploy-mode cluster --keytab /tmp/test.keytab --principal user1@EXAMPLE.COM --class com.sample.SparkJavaApiTest /tmp/sample-spark-java.jar 'ofs://sk-ozone-test1/user1vol/user1bucket/test.txt' 'ofs://sk-ozone-test1/user1vol/user1bucket/output' or spark-submit --master yarn --deploy-mode cluster --keytab /tmp/test.keytab --principal user1@EXAMPLE.COM --class com.sample.SparkJavaApiTest /tmp/sample-spark-java.jar 'o3fs://user1bucket.user1vol.master.localdomain.com/test.txt' 'o3fs://user1bucket.user1vol.master.localdomain.com/output' Fails with exception : 23/08/17 06:21:12 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on node2.localdomain.com:44558 (size: 48.0 KB, free: 366.3 MB) 23/08/17 06:21:13 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, node3.localdomain.com, executor 1): java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:789) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:752) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:847) at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1662) at org.apache.hadoop.ipc.Client.call(Client.java:1487) at org.apache.hadoop.ipc.Client.call(Client.java:1440) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) at com.sun.proxy.$Proxy25.submitRequest(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) at com.sun.proxy.$Proxy25.submitRequest(Unknown Source) at org.apache.hadoop.ozone.om.protocolPB.Hadoop3OmTransport.submitRequest(Hadoop3OmTransport.java:80) at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.submitRequest(OzoneManagerProtocolClientSideTranslatorPB.java:284) at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.getServiceInfo(OzoneManagerProtocolClientSideTranslatorPB.java:1442) at org.apache.hadoop.ozone.client.rpc.RpcClient.<init>(RpcClient.java:236) at org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:247) at org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:114) at org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.<init>(BasicRootedOzoneClientAdapterImpl.java:181) at org.apache.hadoop.fs.ozone.RootedOzoneClientAdapterImpl.<init>(RootedOzoneClientAdapterImpl.java:51) at org.apache.hadoop.fs.ozone.RootedOzoneFileSystem.createAdapter(RootedOzoneFileSystem.java:92) at org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.initialize(BasicRootedOzoneFileSystem.java:149) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3451) at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:161) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3556) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3503) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:521) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361) at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:111) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:267) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:266) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:224) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:95) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$12.apply(Executor.scala:456) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1334) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:462) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:173) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:390) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:623) at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:414) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:834) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:830) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:830) ... 52 more CDP PVC Base cluster has Kerberos enabled. Wanted to know if I am missing out on any configuration on Spark or YARN that is causing the above error. The console log seems to indicate it is able to obtain delegation token for user "user1" correctly for HDFS : 23/08/17 07:27:41 INFO security.HadoopFSDelegationTokenProvider: getting token for: class org.apache.hadoop.hdfs.DistributedFileSystem:hdfs://master.localdomain.com:8020 with renewer yarn/master.localdomain.com@EXAMPLE.COM 23/08/17 07:27:41 INFO hdfs.DFSClient: Created token for user1: HDFS_DELEGATION_TOKEN owner=user1@EXAMPLE.COM, renewer=yarn, realUser=, issueDate=1692271661218, maxDate=1692876461218, sequenceNumber=545, masterKeyId=126 on 10.49.0.9:8020 23/08/17 07:27:41 INFO security.HadoopFSDelegationTokenProvider: getting token for: class org.apache.hadoop.hdfs.DistributedFileSystem:hdfs://master.localdomain.com:8020 with renewer user1@EXAMPLE.COM 23/08/17 07:27:41 INFO hdfs.DFSClient: Created token for user1: HDFS_DELEGATION_TOKEN owner=user1@EXAMPLE.COM, renewer=user1, realUser=, issueDate=1692271661245, maxDate=1692876461245, sequenceNumber=546, masterKeyId=126 on 10.49.0.9:8020 23/08/17 07:27:41 INFO security.HadoopFSDelegationTokenProvider: Renewal interval is 86400067 for token HDFS_DELEGATION_TOKEN 23/08/17 07:27:42 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.5.5-801-b1e2c346541b2d00405d023dc5c4894d038aef98, built on 08/24/2022 12:46 GMT 23/08/17 07:27:42 INFO zookeeper.ZooKeeper: Client environment:host.name=master.localdomain.com But the same does not seems to succeed inside spark job for Ozone FS file ?

skommineni · ‎08-16-2023

Got it..thanks @amk for confirming the same. Will check on the above recommendation.

skommineni · ‎08-11-2023

Hi Team, Looking for inputs on defaulting HDFS storage to Ozone FS. As part of HDFS configuration from Cloudera Manager, in core-site.xml tried setting the following properties mentioned in : https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/ozone-storing-data/topics/ozone-setting-up-ozonefs.html https://ozone.apache.org/docs/1.0.0/interface/ofs.html Post restarting the service, SecondaryNameNode fails with below exception : Failed to start secondary namenode java.io.IOException: This is not a DFS at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.getInfoServer(SecondaryNameNode.java:456) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:239) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(SecondaryNameNode.java:194) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:690) However other role types i.e DataNode and NamedNode start up fine. Listing the files using the command : sudo -u hdfs hdfs dfs -ls / Still list the content from HDFS file system and not from Ozone storage. The same config settings work fine for Hive in defaulting to Ozone from HDFS. Any suggestions on right way of configuring HDFS to Ozone storage is appreciated ? Also is this a supported config scenario ? Thanks.

Online	Offline
Last Visited	‎08-30-2023 04:24 AM

Member Since	‎08-11-2023 03:44 AM
Last Visited	‎08-30-2023 04:24 AM
Posts	7

Cloudera Community

Re: Spark-Submit and Ozone FS :: org.apache.hadoop...

Re: Issue creating/accessing hive external table w...

Re: Issue creating/accessing hive external table w...

Re: Spark-Submit and Ozone FS :: org.apache.hadoop...

Re: Spark-Submit and Ozone FS :: org.apache.hadoop...

Spark-Submit and Ozone FS :: org.apache.hadoop.sec...

Re: Setting HDFS Default FS to Ozone FS

Setting HDFS Default FS to Ozone FS