I have spark job which needs to read the data from s3 which is in other account**(Data Account)** and process that data.
once its processed it should write back to s3 which is in my account.
So I configured access and secret key of **"Data account"** like below in my spark session
val hadoopConf=sc.hadoopConfiguration
hadoopConf.set("fs.s3a.access.key","DataAccountKey")
hadoopConf.set("fs.s3a.secret.key","DataAccountSecretKey")
hadoopConf.set("fs.s3a.endpoint", "s3.ap-northeast-2.amazonaws.com")
System.setProperty("com.amazonaws.services.s3.enableV4", "true")
val df = spark.read.json("s3a://DataAccountS3/path") /* Reading is success */
df.take(3).write.json("s3a://myaccountS3/test/")
with this reading is fine, but I am getting below error when writing
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 301, AWS Service: Amazon S3, AWS Request ID: A5E574113745D6A0, AWS Error Code: PermanentRedirect, AWS Error Message: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.
if I use s3n
df.take(3).write.json("s3n://myaccountS3/test/")
then getting below error
org.apache.hadoop.security.AccessControlException: Permission denied: s3n://myaccountS3/test
but If I dont configure details of Data Account and try to write some dummy data to my s3 from spark it works.
So how should I configure to make both reading from different account s3 and writing to my account s3 works