Support Questions

gubtiny · ‎11-09-2018

I have spark job which needs to read the data from s3 which is in other account**(Data Account)** and process that data. once its processed it should write back to s3 which is in my account. So I configured access and secret key of **"Data account"** like below in my spark session

val hadoopConf=sc.hadoopConfiguration 

hadoopConf.set("fs.s3a.access.key","DataAccountKey") 

hadoopConf.set("fs.s3a.secret.key","DataAccountSecretKey") 

hadoopConf.set("fs.s3a.endpoint", "s3.ap-northeast-2.amazonaws.com") 

System.setProperty("com.amazonaws.services.s3.enableV4", "true") 

val df = spark.read.json("s3a://DataAccountS3/path") /* Reading is success */ 

df.take(3).write.json("s3a://myaccountS3/test/")

with this reading is fine, but I am getting below error when writing

com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 301, AWS Service: Amazon S3, AWS Request ID: A5E574113745D6A0, AWS Error Code: PermanentRedirect, AWS Error Message: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.

if I use s3n  

df.take(3).write.json("s3n://myaccountS3/test/")

then getting below error

org.apache.hadoop.security.AccessControlException: Permission denied: s3n://myaccountS3/test

but If I dont configure details of Data Account and try to write some dummy data to my s3 from spark it works. So how should I configure to make both reading from different account s3 and writing to my account s3 works

stevel · ‎11-30-2018

@Indra s: with the S3A connector you can use per-bucket configuration options to set a different username/pass for the remote bucket

fs.s3a.bucket.myaccounts3.access.key=AAA12

fs.s3a.bucket.myaccounts3.secret.key=XXXYYY

Then when you read or write s3a://myaccounts3/ then these specific username/passwords are used. For other S3A buckets, the default ones are picked up:

fs.s3a.access.key=BBBB

fs.s3a.secret.key=ZZZZZ

Please switch to using the s3a:// connector everywhere: its got much better performance and functionality than the older S3N one, which has recently been removed entirely.

Cloudera Community

Support Questions

spark read from different account s3 and write to my account s3