<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Access S3 Bucket from Spark in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Access-S3-Bucket-from-Spark/m-p/230234#M192084</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;We have default S3 Bucket, say A which is configured in core-site.xml. If we try to access that bucket from Spark in client or cluster mode it is working fine. But for bucket, say B which is not configured in core-site.xml, it works fine in Client mode but in cluster mode it fails with below exception. As a workaround we are passing core-site.xml with bucket B jceks file and it works.&lt;/P&gt;&lt;P&gt;Why is this property not working in cluster mode. Let me know if we need to set any other property for cluster mode.&lt;/P&gt;&lt;PRE&gt;spark.hadoop.hadoop.security.credential.provider.path&lt;/PRE&gt;&lt;P&gt;Client-Mode (Working fine)&lt;/P&gt;&lt;PRE&gt;spark-submit --class &amp;lt;ClassName&amp;gt; --master yarn --deploy-mode client --files .../conf/hive-site.xml --conf spark.hadoop.hadoop.security.credential.provider.path=jceks://hdfs/.../1.jceks --jars $SPARK_HOME/lib/datanucleus-api-jdo-3.2.6.jar,$SPARK_HOME/lib/datanucleus-rdbms-3.2.9.jar,$SPARK_HOME/lib/datanucleus-core-3.2.10.jar --queue default &amp;lt;jar_path&amp;gt;&lt;/PRE&gt;&lt;P&gt;Cluster- Mode (Not working )&lt;/P&gt;&lt;PRE&gt;spark-submit --class &amp;lt;ClassName&amp;gt; --master yarn --deploy-mode &amp;lt;strong&amp;gt;cluster&amp;lt;/strong&amp;gt; 
  --files .../conf/hive-site.xml --conf spark.hadoop.hadoop.security.credential.provider.path=jceks://hdfs/.../1.jceks --jars $SPARK_HOME/lib/datanucleus-api-jdo-3.2.6.jar,$SPARK_HOME/lib/datanucleus-rdbms-3.2.9.jar,$SPARK_HOME/lib/datanucleus-core-3.2.10.jar --queue default &amp;lt;jar_path&amp;gt;&lt;/PRE&gt;
&lt;PRE&gt;Exception:

diagnostics: User class threw exception: java.nio.file.AccessDeniedException: s3a://&amp;lt;bucketname&amp;gt;/server_date=2017-08-23: getFileStatus on s3a://&amp;lt;bucketname&amp;gt;/server_date=2017-08-23: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 92F94B902D52D864), S3 Extended Request ID: 1Df3YxG5znruRbsOpsGhCO40s4d9HKhvD14FKk1DSt//lFFuEdXjGueNg5+MYbUIP4aKvsjrZmw=&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;Cluster-Mode (Workaround which is working)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Step 1:  cp /usr/hdp/current/hadoop-client/conf/core-site.xml /&amp;lt;home_dir&amp;gt;/core-site.xml&lt;BR /&gt;
Step2: Edit core-site.xml and replace jceks://hdfs/.../&lt;STRONG&gt;default.jceks&lt;/STRONG&gt; to jceks://hdfs/.../&lt;STRONG&gt;1.jceks&lt;/STRONG&gt;&lt;BR /&gt;
Step3:Pass core-site.xml to spark submit command&lt;/P&gt;&lt;P&gt;spark-submit --class &amp;lt;ClassName&amp;gt; --master yarn --deploy-mode &lt;STRONG&gt;cluster&lt;/STRONG&gt; 
  --files .../conf/hive-site.xml,&lt;STRONG&gt;../conf/core-site.xml &lt;/STRONG&gt;--conf 
spark.hadoop.hadoop.security.credential.provider.path=jceks://hdfs/.../1.jceks&lt;/P&gt;&lt;P&gt; --jars 
$SPARK_HOME/lib/datanucleus-api-jdo-3.2.6.jar,$SPARK_HOME/lib/datanucleus-rdbms-3.2.9.jar,$SPARK_HOME/lib/datanucleus-core-3.2.10.jar&lt;/P&gt;&lt;P&gt; --verbose  --queue default &amp;lt;jar_path&amp;gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Subacini&lt;/P&gt;</description>
    <pubDate>Wed, 30 Aug 2017 00:22:56 GMT</pubDate>
    <dc:creator>subacini_balakr</dc:creator>
    <dc:date>2017-08-30T00:22:56Z</dc:date>
  </channel>
</rss>

