Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark S3 write failed

avatar
Contributor

I'm attempting to write a parquet file to an S3 bucket, but getting the below error:

py4j.protocol.Py4JJavaError: An error occurred while calling o36.parquet. : java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager.<init>(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:453) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194) at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745)

The line of python code that fails is:

df.write.parquet("s3a://myfolder/myotherfolder")

The same line line of code works successfully if I write it to HDFS instead of S3:

df.write.parquet("hdfs://myfolder/myotherfolder")

I'm using spark-2.0.2-bin-hadoop2.7 and aws-java-sdk-1.11.38 binaries. Right now I'm running it interactively in PyCharm on my Mac.

1 ACCEPTED SOLUTION

avatar
New Contributor

Hi @Ed Prout,

I have had the same error in some scala code. I came across this post when looking to solve the issue/problem.

Site: http://deploymentzone.com/2015/12/20/s3a-on-spark-on-aws-ec2/

It states that if you see the error, then you need to bump down the "aws-java-sdk" to 1.7.4.

`If you see a different exception message:

<code>java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManagerConfiguration.setMultipartUploadThreshold(I)V  

Then make sure you're using aws-java-sdk-1.7.4.jar and not a more recent version.`

I bumped my jar down to 1.7.4, and the problem disappeared.

I hope this helps.

/Kasper

View solution in original post

3 REPLIES 3

avatar
New Contributor

Hi @Ed Prout,

I have had the same error in some scala code. I came across this post when looking to solve the issue/problem.

Site: http://deploymentzone.com/2015/12/20/s3a-on-spark-on-aws-ec2/

It states that if you see the error, then you need to bump down the "aws-java-sdk" to 1.7.4.

`If you see a different exception message:

<code>java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManagerConfiguration.setMultipartUploadThreshold(I)V  

Then make sure you're using aws-java-sdk-1.7.4.jar and not a more recent version.`

I bumped my jar down to 1.7.4, and the problem disappeared.

I hope this helps.

/Kasper

avatar
Explorer

I rushed into same problem and this worked for me, thanks!

avatar

If things aren't working with HDP 2.5 or HDCloud, I'd recommend starting with [Troubleshooting S3a](https://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/s3-trouble/index.html)

If you are using ASF released binaries, then those docs are mostly valid too, though as we pulled in much of the later features coming in S3a on Hadoop 2.8 (after writing them!), the docs are a bit inconsistent. The closest ASF docs on troubleshooting are those for [Hadoop 2.8](https://github.com/apache/hadoop/blob/branch-2.8/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md#troubleshooting-s3a).

As Kasper pointed out, this is due to AWS JAR versioning. the Amazon SDK has been pretty brittle against change, and you *must* run with the same version of the AWS SDK which Hadoop was built with (which also needs a consistent version of jackson, ...).

Hadoop 2.7.x: AWS SDK 1.7.4

Hadoop 2.8.x: 1.10.6

Hadoop 2.9+: probably 10.11+ or later, with jackson bumped up to 2.7.8 to match.