Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Authentication for Amazon s3 using Temporary credentials (STS)

avatar
New Contributor

Hi,
We are currently running some tests regarding the authentication for amazon s3 using temporary credentials and encountering the following errors.


We wanted to test this snippet : https://github.com/satishpandey/aws-spark-samples/blob/master/src/com/spark/aws/samples/s3/SparkS3ST...


in a cloudera 5.13 cluster with Spark 2.2.
We just changed a bit the code to use "BasicAWSCredentials" instead of "InstanceProfileCredentialsProvider".


...
Exception in thread "main" java.lang.NoSuchMethodError: com.amazonaws.SDKGlobalConfiguration.isInRegionOptimizedModeEnabled()Z
at com.amazonaws.ClientConfigurationFactory.getConfig(ClientConfigurationFactory.java:35)
at com.amazonaws.client.builder.AwsClientBuilder.resolveClientConfiguration(AwsClientBuilder.java:163)
at com.amazonaws.client.builder.AwsClientBuilder.access$000(AwsClientBuilder.java:52)
at com.amazonaws.client.builder.AwsClientBuilder$SyncBuilderParams.<init>(AwsClientBuilder.java:411)
at com.amazonaws.client.builder.AwsClientBuilder.getSyncClientParams(AwsClientBuilder.java:354)
at com.amazonaws.client.builder.AwsSyncClientBuilder.build(AwsSyncClientBuilder.java:46)
at com.spark.aws.samples.s3.SparkS3STSAssumeRole.main(SparkS3STSAssumeRole.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
...


The issue is comming from the jar "aws-java-sdk.1.10.6" which is comming from the cluster and used instead of the version "1.11.145" provided in the pom.xml (which contained the method).


We tried to follow the cloudera documentation : https://www.cloudera.com/documentation/enterprise/latest/topics/sg_aws_credentials.html


To set some hadoop properties :
-Dfs.s3a.access.key=your_temp_access_key
-Dfs.s3a.secret.key=your_temp_secret_key
-Dfs.s3a.session.token=your_session_token_from_AmazonSTS
-Dfs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider


But in that case, we had that issue :
...
Exception in thread "main" java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified by setting the fs.s3n.awsAccessKeyId and fs.s3n.awsSecretAccessKey properties (respectively).
at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:74)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:80)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at org.apache.hadoop.fs.s3native.$Proxy36.initialize(Unknown Source)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:334)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2800)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2837)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2819)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:97)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:206)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
at org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:455)
at org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:45)
at com.spark.aws.samples.s3.SparkS3STSAssumeRole.main(SparkS3STSAssumeRole.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
...


We generated the token and temporary access / secret key with this java code :


...
com.amazonaws.auth.AWSCredentials credentials = new com.amazonaws.auth.BasicAWSCredentials("XXX","XXX");
com.amazonaws.auth.AWSCredentialsProvider credentialsProvider = new com.amazonaws.internal.StaticCredentialsProvider(credentials);

com.amazonaws.client.builder.AwsClientBuilder.EndpointConfiguration endp = new com.amazonaws.client.builder.AwsClientBuilder.EndpointConfiguration("sts.eu-west-1.amazonaws.com","eu-west-1");
com.amazonaws.services.securitytoken.AWSSecurityTokenService sts = com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClientBuilder.standard().withCredentials(credentialsProvider).withEndpointConfiguration(endp).build();
com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider provider2 = new com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.Builder("myArnRole", "testSessionName").withStsClient(sts).build();
...


Any information/feedback will be much appreciated.

Thanks,

Stéfan Le Moing

2 REPLIES 2

avatar
New Contributor
Hi,

Just to add some information. We tried to execute this code : https://github.com/satishpandey/aws-spark-samples/blob/master/src/com/spark/aws/samples/s3/SparkS3ST...
But we had issues due to the hadoop properties. The "session.token" property seems to never be used when we set the "fs.s3n.awsSecretAccessKey" and "fs.s3n.awsAccessKeyId" properties.


Does anyone succeded to use aws sts in a spark code ?
We don't know if we have an issue regarding our configuration settings in the use of the properties or in the temporary credentials generation.

Regards,

avatar
Cloudera Employee

STS should work. I would try (1) using s3a, not s3n, and (2) building your spark app with the same AWS SDK version as used in the cluster.