Support Questions

slemoing · ‎12-08-2017

Hi,
We are currently running some tests regarding the authentication for amazon s3 using temporary credentials and encountering the following errors.

We wanted to test this snippet : https://github.com/satishpandey/aws-spark-samples/blob/master/src/com/spark/aws/samples/s3/SparkS3ST...

in a cloudera 5.13 cluster with Spark 2.2.
We just changed a bit the code to use "BasicAWSCredentials" instead of "InstanceProfileCredentialsProvider".

...
Exception in thread "main" java.lang.NoSuchMethodError: com.amazonaws.SDKGlobalConfiguration.isInRegionOptimizedModeEnabled()Z
at com.amazonaws.ClientConfigurationFactory.getConfig(ClientConfigurationFactory.java:35)
at com.amazonaws.client.builder.AwsClientBuilder.resolveClientConfiguration(AwsClientBuilder.java:163)
at com.amazonaws.client.builder.AwsClientBuilder.access$000(AwsClientBuilder.java:52)
at com.amazonaws.client.builder.AwsClientBuilder$SyncBuilderParams.<init>(AwsClientBuilder.java:411)
at com.amazonaws.client.builder.AwsClientBuilder.getSyncClientParams(AwsClientBuilder.java:354)
at com.amazonaws.client.builder.AwsSyncClientBuilder.build(AwsSyncClientBuilder.java:46)
at com.spark.aws.samples.s3.SparkS3STSAssumeRole.main(SparkS3STSAssumeRole.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
...

The issue is comming from the jar "aws-java-sdk.1.10.6" which is comming from the cluster and used instead of the version "1.11.145" provided in the pom.xml (which contained the method).

We tried to follow the cloudera documentation : https://www.cloudera.com/documentation/enterprise/latest/topics/sg_aws_credentials.html

To set some hadoop properties :
-Dfs.s3a.access.key=your_temp_access_key
-Dfs.s3a.secret.key=your_temp_secret_key
-Dfs.s3a.session.token=your_session_token_from_AmazonSTS
-Dfs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider

But in that case, we had that issue :
...
Exception in thread "main" java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified by setting the fs.s3n.awsAccessKeyId and fs.s3n.awsSecretAccessKey properties (respectively).
at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:74)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:80)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at org.apache.hadoop.fs.s3native.$Proxy36.initialize(Unknown Source)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:334)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2800)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2837)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2819)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:97)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:206)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
at org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:455)
at org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:45)
at com.spark.aws.samples.s3.SparkS3STSAssumeRole.main(SparkS3STSAssumeRole.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
...

We generated the token and temporary access / secret key with this java code :

...
com.amazonaws.auth.AWSCredentials credentials = new com.amazonaws.auth.BasicAWSCredentials("XXX","XXX");
com.amazonaws.auth.AWSCredentialsProvider credentialsProvider = new com.amazonaws.internal.StaticCredentialsProvider(credentials);

com.amazonaws.client.builder.AwsClientBuilder.EndpointConfiguration endp = new com.amazonaws.client.builder.AwsClientBuilder.EndpointConfiguration("sts.eu-west-1.amazonaws.com","eu-west-1");
com.amazonaws.services.securitytoken.AWSSecurityTokenService sts = com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClientBuilder.standard().withCredentials(credentialsProvider).withEndpointConfiguration(endp).build();
com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider provider2 = new com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.Builder("myArnRole", "testSessionName").withStsClient(sts).build();
...

Any information/feedback will be much appreciated.

Thanks,

Stéfan Le Moing

slemoing · ‎12-13-2017

Hi,

Just to add some information. We tried to execute this code : https://github.com/satishpandey/aws-spark-samples/blob/master/src/com/spark/aws/samples/s3/SparkS3ST...
But we had issues due to the hadoop properties. The "session.token" property seems to never be used when we set the "fs.s3n.awsSecretAccessKey" and "fs.s3n.awsAccessKeyId" properties.

Does anyone succeded to use aws sts in a spark code ?
We don't know if we have an issue regarding our configuration settings in the use of the properties or in the temporary credentials generation.

Regards,

AaronFabbri · ‎02-01-2018

STS should work. I would try (1) using s3a, not s3n, and (2) building your spark app with the same AWS SDK version as used in the cluster.

Cloudera Community

Support Questions

Authentication for Amazon s3 using Temporary credentials (STS)