<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark S3 write failed in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-S3-write-failed/m-p/173619#M42178</link>
    <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/10750/edmundprout.html" nodeid="10750"&gt;@Ed Prout&lt;/A&gt;, &lt;/P&gt;&lt;P&gt;I have had the same error in some scala code. I came across this post when looking to solve the issue/problem. &lt;/P&gt;&lt;P&gt;Site: &lt;A href="http://deploymentzone.com/2015/12/20/s3a-on-spark-on-aws-ec2/" target="_blank"&gt;http://deploymentzone.com/2015/12/20/s3a-on-spark-on-aws-ec2/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;It states that if you see the error, then you need to bump down the "aws-java-sdk" to 1.7.4. &lt;/P&gt;&lt;P&gt;`If you see a different exception message:&lt;/P&gt;&lt;PRE&gt;&amp;lt;code&amp;gt;java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManagerConfiguration.setMultipartUploadThreshold(I)V  
&lt;/PRE&gt;
&lt;P&gt;Then make sure you're using &lt;CODE&gt;aws-java-sdk-1.7.4.jar&lt;/CODE&gt; and not a more recent version.`&lt;/P&gt;&lt;P&gt;I bumped my jar down to 1.7.4, and the problem disappeared. &lt;/P&gt;&lt;P&gt;I hope this helps. &lt;/P&gt;&lt;P&gt;/Kasper&lt;/P&gt;</description>
    <pubDate>Tue, 04 Oct 2016 16:31:24 GMT</pubDate>
    <dc:creator>kasperaaquist</dc:creator>
    <dc:date>2016-10-04T16:31:24Z</dc:date>
    <item>
      <title>Spark S3 write failed</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-S3-write-failed/m-p/173618#M42177</link>
      <description>&lt;P&gt;I'm attempting to write a parquet file to an S3 bucket, but getting the below error:&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;py4j.protocol.Py4JJavaError: An error occurred while calling o36.parquet.
: java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager.&amp;lt;init&amp;gt;(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:453)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;&lt;/P&gt;&lt;P&gt;The line of python code that fails is:&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;df.write.parquet("s3a://myfolder/myotherfolder")&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;&lt;/P&gt;&lt;P&gt;The same line line of code works successfully if I write it to HDFS instead of S3:&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;df.write.parquet("hdfs://myfolder/myotherfolder")&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;&lt;/P&gt;&lt;P&gt;I'm using spark-2.0.2-bin-hadoop2.7 and aws-java-sdk-1.11.38 binaries. Right now I'm running it interactively in PyCharm on my Mac.&lt;/P&gt;</description>
      <pubDate>Thu, 29 Sep 2016 01:47:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-S3-write-failed/m-p/173618#M42177</guid>
      <dc:creator>edmund_prout</dc:creator>
      <dc:date>2016-09-29T01:47:56Z</dc:date>
    </item>
    <item>
      <title>Re: Spark S3 write failed</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-S3-write-failed/m-p/173619#M42178</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/10750/edmundprout.html" nodeid="10750"&gt;@Ed Prout&lt;/A&gt;, &lt;/P&gt;&lt;P&gt;I have had the same error in some scala code. I came across this post when looking to solve the issue/problem. &lt;/P&gt;&lt;P&gt;Site: &lt;A href="http://deploymentzone.com/2015/12/20/s3a-on-spark-on-aws-ec2/" target="_blank"&gt;http://deploymentzone.com/2015/12/20/s3a-on-spark-on-aws-ec2/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;It states that if you see the error, then you need to bump down the "aws-java-sdk" to 1.7.4. &lt;/P&gt;&lt;P&gt;`If you see a different exception message:&lt;/P&gt;&lt;PRE&gt;&amp;lt;code&amp;gt;java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManagerConfiguration.setMultipartUploadThreshold(I)V  
&lt;/PRE&gt;
&lt;P&gt;Then make sure you're using &lt;CODE&gt;aws-java-sdk-1.7.4.jar&lt;/CODE&gt; and not a more recent version.`&lt;/P&gt;&lt;P&gt;I bumped my jar down to 1.7.4, and the problem disappeared. &lt;/P&gt;&lt;P&gt;I hope this helps. &lt;/P&gt;&lt;P&gt;/Kasper&lt;/P&gt;</description>
      <pubDate>Tue, 04 Oct 2016 16:31:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-S3-write-failed/m-p/173619#M42178</guid>
      <dc:creator>kasperaaquist</dc:creator>
      <dc:date>2016-10-04T16:31:24Z</dc:date>
    </item>
    <item>
      <title>Re: Spark S3 write failed</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-S3-write-failed/m-p/173620#M42179</link>
      <description>&lt;P&gt;I rushed into same problem and this worked for me, thanks!&lt;/P&gt;</description>
      <pubDate>Sat, 14 Jan 2017 14:42:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-S3-write-failed/m-p/173620#M42179</guid>
      <dc:creator>imai</dc:creator>
      <dc:date>2017-01-14T14:42:23Z</dc:date>
    </item>
    <item>
      <title>Re: Spark S3 write failed</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-S3-write-failed/m-p/173621#M42180</link>
      <description>&lt;P&gt;If things aren't working with HDP 2.5 or HDCloud, I'd recommend starting with [Troubleshooting S3a](https://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/s3-trouble/index.html)&lt;/P&gt;&lt;P&gt;If you are using ASF released binaries, then those docs are mostly valid too, though as we pulled in much of the later features coming in S3a on Hadoop 2.8 (after writing them!), the docs are a bit inconsistent. The closest ASF docs on troubleshooting are those for [Hadoop 2.8](https://github.com/apache/hadoop/blob/branch-2.8/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md#troubleshooting-s3a).&lt;/P&gt;&lt;P&gt;As Kasper pointed out, this is due to AWS JAR versioning. the Amazon SDK has been pretty brittle against change, and you *must* run with the same version of the AWS SDK which Hadoop was built with (which also needs a consistent version of jackson, ...).&lt;/P&gt;&lt;P&gt;Hadoop 2.7.x: AWS SDK 1.7.4&lt;/P&gt;&lt;P&gt;Hadoop 2.8.x: 1.10.6&lt;/P&gt;&lt;P&gt;Hadoop 2.9+: probably 10.11+ or later, with jackson bumped up to 2.7.8 to match.&lt;/P&gt;</description>
      <pubDate>Sat, 14 Jan 2017 21:24:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-S3-write-failed/m-p/173621#M42180</guid>
      <dc:creator>stevel</dc:creator>
      <dc:date>2017-01-14T21:24:23Z</dc:date>
    </item>
  </channel>
</rss>

