I have spark job which needs to read the data from s3 which is in other account**(Data Account)** and process that data.
once its processed it should write back to s3 which is in my account.
So I configured access and secret key of **"Data account"** like below in my spark session val hadoopConf=sc.hadoopConfiguration
val df = spark.read.json("s3a://DataAccountS3/path") /* Reading is success */
df.take(3).write.json("s3a://myaccountS3/test/") with this reading is fine, but I am getting below error when writing com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 301, AWS Service: Amazon S3, AWS Request ID: A5E574113745D6A0, AWS Error Code: PermanentRedirect, AWS Error Message: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint. if I use s3n
df.take(3).write.json("s3n://myaccountS3/test/") then getting below error org.apache.hadoop.security.AccessControlException: Permission denied: s3n://myaccountS3/test
but If I dont configure details of Data Account and try to write some dummy data to my s3 from spark it works.
So how should I configure to make both reading from different account s3 and writing to my account s3 works
... View more
Hi I have amabri version 2.6.2 and HDP 18.104.22.168, but this is by default supporting spark 2.1 but my spark job need spark 2.2 or higher. How to upgrade the spark ? is it required to upgrade ambari to 2.7 then Hdp to 3.0 or is there a way to upgrade spark directly ? Regards Indra
... View more