Support Questions

Find answers, ask questions, and share your expertise

spark read from different account s3 and write to my account s3

avatar
New Contributor

I have spark job which needs to read the data from s3 which is in other account**(Data Account)** and process that data. once its processed it should write back to s3 which is in my account. So I configured access and secret key of **"Data account"** like below in my spark session

val hadoopConf=sc.hadoopConfiguration 

hadoopConf.set("fs.s3a.access.key","DataAccountKey") 

hadoopConf.set("fs.s3a.secret.key","DataAccountSecretKey") 

hadoopConf.set("fs.s3a.endpoint", "s3.ap-northeast-2.amazonaws.com") 

System.setProperty("com.amazonaws.services.s3.enableV4", "true") 

val df = spark.read.json("s3a://DataAccountS3/path") /* Reading is success */ 

df.take(3).write.json("s3a://myaccountS3/test/")

with this reading is fine, but I am getting below error when writing

com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 301, AWS Service: Amazon S3, AWS Request ID: A5E574113745D6A0, AWS Error Code: PermanentRedirect, AWS Error Message: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.
if I use s3n  

df.take(3).write.json("s3n://myaccountS3/test/")

then getting below error

org.apache.hadoop.security.AccessControlException: Permission denied: s3n://myaccountS3/test


but If I dont configure details of Data Account and try to write some dummy data to my s3 from spark it works. So how should I configure to make both reading from different account s3 and writing to my account s3 works

1 REPLY 1

avatar

@Indra s: with the S3A connector you can use per-bucket configuration options to set a different username/pass for the remote bucket

fs.s3a.bucket.myaccounts3.access.key=AAA12

fs.s3a.bucket.myaccounts3.secret.key=XXXYYY

Then when you read or write s3a://myaccounts3/ then these specific username/passwords are used. For other S3A buckets, the default ones are picked up:

fs.s3a.access.key=BBBB

fs.s3a.secret.key=ZZZZZ

Please switch to using the s3a:// connector everywhere: its got much better performance and functionality than the older S3N one, which has recently been removed entirely.