Created on 12-08-2017 01:22 AM - edited 09-16-2022 05:37 AM
Hi,
We are currently running some tests regarding the authentication for amazon s3 using temporary credentials and encountering the following errors.
We wanted to test this snippet : https://github.com/satishpandey/aws-spark-samples/blob/master/src/com/spark/aws/samples/s3/SparkS3ST...
in a cloudera 5.13 cluster with Spark 2.2.
We just changed a bit the code to use "BasicAWSCredentials" instead of "InstanceProfileCredentialsProvider".
...
Exception in thread "main" java.lang.NoSuchMethodError: com.amazonaws.SDKGlobalConfiguration.isInRegionOptimizedModeEnabled()Z
at com.amazonaws.ClientConfigurationFactory.getConfig(ClientConfigurationFactory.java:35)
at com.amazonaws.client.builder.AwsClientBuilder.resolveClientConfiguration(AwsClientBuilder.java:163)
at com.amazonaws.client.builder.AwsClientBuilder.access$000(AwsClientBuilder.java:52)
at com.amazonaws.client.builder.AwsClientBuilder$SyncBuilderParams.<init>(AwsClientBuilder.java:411)
at com.amazonaws.client.builder.AwsClientBuilder.getSyncClientParams(AwsClientBuilder.java:354)
at com.amazonaws.client.builder.AwsSyncClientBuilder.build(AwsSyncClientBuilder.java:46)
at com.spark.aws.samples.s3.SparkS3STSAssumeRole.main(SparkS3STSAssumeRole.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
...
The issue is comming from the jar "aws-java-sdk.1.10.6" which is comming from the cluster and used instead of the version "1.11.145" provided in the pom.xml (which contained the method).
We tried to follow the cloudera documentation : https://www.cloudera.com/documentation/enterprise/latest/topics/sg_aws_credentials.html
To set some hadoop properties :
-Dfs.s3a.access.key=your_temp_access_key
-Dfs.s3a.secret.key=your_temp_secret_key
-Dfs.s3a.session.token=your_session_token_from_AmazonSTS
-Dfs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
But in that case, we had that issue :
...
Exception in thread "main" java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified by setting the fs.s3n.awsAccessKeyId and fs.s3n.awsSecretAccessKey properties (respectively).
at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:74)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:80)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at org.apache.hadoop.fs.s3native.$Proxy36.initialize(Unknown Source)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:334)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2800)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2837)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2819)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:97)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:206)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
at org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:455)
at org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:45)
at com.spark.aws.samples.s3.SparkS3STSAssumeRole.main(SparkS3STSAssumeRole.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
...
We generated the token and temporary access / secret key with this java code :
...
com.amazonaws.auth.AWSCredentials credentials = new com.amazonaws.auth.BasicAWSCredentials("XXX","XXX");
com.amazonaws.auth.AWSCredentialsProvider credentialsProvider = new com.amazonaws.internal.StaticCredentialsProvider(credentials);
com.amazonaws.client.builder.AwsClientBuilder.EndpointConfiguration endp = new com.amazonaws.client.builder.AwsClientBuilder.EndpointConfiguration("sts.eu-west-1.amazonaws.com","eu-west-1");
com.amazonaws.services.securitytoken.AWSSecurityTokenService sts = com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClientBuilder.standard().withCredentials(credentialsProvider).withEndpointConfiguration(endp).build();
com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider provider2 = new com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.Builder("myArnRole", "testSessionName").withStsClient(sts).build();
...
Any information/feedback will be much appreciated.
Thanks,
Stéfan Le Moing
Created 12-13-2017 06:21 AM
Created 02-01-2018 10:57 PM
STS should work. I would try (1) using s3a, not s3n, and (2) building your spark app with the same AWS SDK version as used in the cluster.