Hi,
We have default S3 Bucket, say A which is configured in core-site.xml. If we try to access that bucket from Spark in client or cluster mode it is working fine. But for bucket, say B which is not configured in core-site.xml, it works fine in Client mode but in cluster mode it fails with below exception. As a workaround we are passing core-site.xml with bucket B jceks file and it works.
Why is this property not working in cluster mode. Let me know if we need to set any other property for cluster mode.
spark.hadoop.hadoop.security.credential.provider.path
Client-Mode (Working fine)
spark-submit --class <ClassName> --master yarn --deploy-mode client --files .../conf/hive-site.xml --conf spark.hadoop.hadoop.security.credential.provider.path=jceks://hdfs/.../1.jceks --jars $SPARK_HOME/lib/datanucleus-api-jdo-3.2.6.jar,$SPARK_HOME/lib/datanucleus-rdbms-3.2.9.jar,$SPARK_HOME/lib/datanucleus-core-3.2.10.jar --queue default <jar_path>
Cluster- Mode (Not working )
spark-submit --class <ClassName> --master yarn --deploy-mode <strong>cluster</strong>
--files .../conf/hive-site.xml --conf spark.hadoop.hadoop.security.credential.provider.path=jceks://hdfs/.../1.jceks --jars $SPARK_HOME/lib/datanucleus-api-jdo-3.2.6.jar,$SPARK_HOME/lib/datanucleus-rdbms-3.2.9.jar,$SPARK_HOME/lib/datanucleus-core-3.2.10.jar --queue default <jar_path>
Exception:
diagnostics: User class threw exception: java.nio.file.AccessDeniedException: s3a://<bucketname>/server_date=2017-08-23: getFileStatus on s3a://<bucketname>/server_date=2017-08-23: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 92F94B902D52D864), S3 Extended Request ID: 1Df3YxG5znruRbsOpsGhCO40s4d9HKhvD14FKk1DSt//lFFuEdXjGueNg5+MYbUIP4aKvsjrZmw=
Cluster-Mode (Workaround which is working)
Step 1: cp /usr/hdp/current/hadoop-client/conf/core-site.xml /<home_dir>/core-site.xml
Step2: Edit core-site.xml and replace jceks://hdfs/.../default.jceks to jceks://hdfs/.../1.jceks
Step3:Pass core-site.xml to spark submit command
spark-submit --class <ClassName> --master yarn --deploy-mode cluster
--files .../conf/hive-site.xml,../conf/core-site.xml --conf
spark.hadoop.hadoop.security.credential.provider.path=jceks://hdfs/.../1.jceks
--jars
$SPARK_HOME/lib/datanucleus-api-jdo-3.2.6.jar,$SPARK_HOME/lib/datanucleus-rdbms-3.2.9.jar,$SPARK_HOME/lib/datanucleus-core-3.2.10.jar
--verbose --queue default <jar_path>
Thanks
Subacini