Created 09-28-2016 06:47 PM
I'm attempting to write a parquet file to an S3 bucket, but getting the below error:
py4j.protocol.Py4JJavaError: An error occurred while calling o36.parquet. : java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager.<init>(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:453) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194) at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745)
The line of python code that fails is:
df.write.parquet("s3a://myfolder/myotherfolder")
The same line line of code works successfully if I write it to HDFS instead of S3:
df.write.parquet("hdfs://myfolder/myotherfolder")
I'm using spark-2.0.2-bin-hadoop2.7 and aws-java-sdk-1.11.38 binaries. Right now I'm running it interactively in PyCharm on my Mac.
Created 10-04-2016 09:31 AM
Hi @Ed Prout,
I have had the same error in some scala code. I came across this post when looking to solve the issue/problem.
Site: http://deploymentzone.com/2015/12/20/s3a-on-spark-on-aws-ec2/
It states that if you see the error, then you need to bump down the "aws-java-sdk" to 1.7.4.
`If you see a different exception message:
<code>java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManagerConfiguration.setMultipartUploadThreshold(I)V
Then make sure you're using aws-java-sdk-1.7.4.jar
and not a more recent version.`
I bumped my jar down to 1.7.4, and the problem disappeared.
I hope this helps.
/Kasper
Created 10-04-2016 09:31 AM
Hi @Ed Prout,
I have had the same error in some scala code. I came across this post when looking to solve the issue/problem.
Site: http://deploymentzone.com/2015/12/20/s3a-on-spark-on-aws-ec2/
It states that if you see the error, then you need to bump down the "aws-java-sdk" to 1.7.4.
`If you see a different exception message:
<code>java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManagerConfiguration.setMultipartUploadThreshold(I)V
Then make sure you're using aws-java-sdk-1.7.4.jar
and not a more recent version.`
I bumped my jar down to 1.7.4, and the problem disappeared.
I hope this helps.
/Kasper
Created 01-14-2017 06:42 AM
I rushed into same problem and this worked for me, thanks!
Created 01-14-2017 01:24 PM
If things aren't working with HDP 2.5 or HDCloud, I'd recommend starting with [Troubleshooting S3a](https://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/s3-trouble/index.html)
If you are using ASF released binaries, then those docs are mostly valid too, though as we pulled in much of the later features coming in S3a on Hadoop 2.8 (after writing them!), the docs are a bit inconsistent. The closest ASF docs on troubleshooting are those for [Hadoop 2.8](https://github.com/apache/hadoop/blob/branch-2.8/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md#troubleshooting-s3a).
As Kasper pointed out, this is due to AWS JAR versioning. the Amazon SDK has been pretty brittle against change, and you *must* run with the same version of the AWS SDK which Hadoop was built with (which also needs a consistent version of jackson, ...).
Hadoop 2.7.x: AWS SDK 1.7.4
Hadoop 2.8.x: 1.10.6
Hadoop 2.9+: probably 10.11+ or later, with jackson bumped up to 2.7.8 to match.