Member since
11-01-2018
2
Posts
0
Kudos Received
0
Solutions
11-09-2018
02:25 PM
I have spark job which needs to read the data from s3 which is in other account**(Data Account)** and process that data.
once its processed it should write back to s3 which is in my account.
So I configured access and secret key of **"Data account"** like below in my spark session val hadoopConf=sc.hadoopConfiguration
hadoopConf.set("fs.s3a.access.key","DataAccountKey")
hadoopConf.set("fs.s3a.secret.key","DataAccountSecretKey")
hadoopConf.set("fs.s3a.endpoint", "s3.ap-northeast-2.amazonaws.com")
System.setProperty("com.amazonaws.services.s3.enableV4", "true")
val df = spark.read.json("s3a://DataAccountS3/path") /* Reading is success */
df.take(3).write.json("s3a://myaccountS3/test/") with this reading is fine, but I am getting below error when writing com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 301, AWS Service: Amazon S3, AWS Request ID: A5E574113745D6A0, AWS Error Code: PermanentRedirect, AWS Error Message: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint. if I use s3n
df.take(3).write.json("s3n://myaccountS3/test/") then getting below error org.apache.hadoop.security.AccessControlException: Permission denied: s3n://myaccountS3/test
but If I dont configure details of Data Account and try to write some dummy data to my s3 from spark it works.
So how should I configure to make both reading from different account s3 and writing to my account s3 works
... View more
Labels:
- Labels:
-
Apache Spark
11-01-2018
12:41 PM
Hi I have amabri version 2.6.2 and HDP 2.6.0.3, but this is by default supporting spark 2.1 but my spark job need spark 2.2 or higher. How to upgrade the spark ? is it required to upgrade ambari to 2.7 then Hdp to 3.0 or is there a way to upgrade spark directly ? Regards Indra
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Spark