Member since
02-06-2021
1
Post
0
Kudos Received
0
Solutions
02-06-2021
03:32 PM
We are trying to implement a solution on our On prem installation of CDH 6.3.3. We are reading from AWS s3 as a dataframe and saving as a csv file on HDFS. We need to assume role to connect to S3 bucket. So we are using following Hadoop Configuration sc.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider") sc.hadoopConfiguration.set("fs.s3a.access.key", "A***********") sc.hadoopConfiguration.set("fs.s3a.secret.key", "xfJ*******************") sc.hadoopConfiguration.set("fs.s3a.assumed.role.arn", "arn:aws:iam::45**********:role/can-********************/can********************") While doing this, we were getting the below error, which was resolved by declaring AWS_REGION=ca-central-1 as env variable. Error: Instantiate org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider on : com.amazonaws.SdkClientException: Unable to find a region via the region provider chain. Must provide an explicit region in the builder or setup environment to supply a region.: Unable to find a region via the region provider chain. Must provide an explicit region in the builder or setup environment to supply a region. Now: running spark job as --master local it runs fine, since AWS_REGION is defined. but running as --master yarn it still gives the same error. Our CDH Admin has tried to define the AWS_REGION as global env variable in all the cluster nodes and restart the spark service, but still its the same error. Please suggest any resolution. TIA
... View more
Labels:
- Labels:
-
Apache Spark