<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: FileAlreadyExistsException when calling saveAsTextFile in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/FileAlreadyExistsException-when-calling-saveAsTextFile/m-p/161078#M57267</link>
    <description>&lt;P&gt;The exception seems to happen when nFiles is larger, like 1000, not when it's 10.&lt;/P&gt;&lt;PRE&gt;spark-submit --master yarn-cluster --class com.cisco.dfsio.test.Runner hdfs:///user/$USER/mantl-apps/benchmarking-apps/spark-test-dfsio-with-dependencies.jar --file data/testdfsio-write --nFiles 1000 --fSize 200000 -m write --log data/testdfsio-write/testHdfsIO-WRITE.log&lt;/PRE&gt;&lt;P&gt;btw: not my code.&lt;/P&gt;</description>
    <pubDate>Fri, 17 Mar 2017 03:18:44 GMT</pubDate>
    <dc:creator>wbekker</dc:creator>
    <dc:date>2017-03-17T03:18:44Z</dc:date>
    <item>
      <title>FileAlreadyExistsException when calling saveAsTextFile</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/FileAlreadyExistsException-when-calling-saveAsTextFile/m-p/161076#M57265</link>
      <description>&lt;P&gt;
	When running this &lt;A href="https://github.com/elisska/mantl-apps/blob/master/benchmarking-apps/spark-benchmarking-apps/spark-test-dfsio/src/main/scala/com/cisco/dfsio/test/Runner.scala"&gt;small piece of Scala code&lt;/A&gt; I get a "org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://xxx.eu-west-1.compute.internal:8020/user/cloudbreak/data/testdfsio-write". &lt;/P&gt;&lt;P&gt;Below the piece of code where the `saveAsTextFile` is executed. The directory does not exist before running this script. Why is this FileAlreadyExistsException being raised?&lt;/P&gt;&lt;PRE&gt;            // Create a Range and parallelize it, on nFiles partitions
            // The idea is to have a small RDD partitioned on a given number of workers
            // then each worker will generate data to write
            val a = sc.parallelize(1 until config.nFiles + 1, config.nFiles)


            val b = a.map(i =&amp;gt; {
              // generate an array of Byte (8 bit), with dimension fSize
              // fill it up with "0" chars, and make it a string for it to be saved as text
              // TODO: this approach can still cause memory problems in the executor if the array is too big.
              val x = Array.ofDim[Byte](fSizeBV.value).map(x =&amp;gt; "0").mkString("")
              x
            })


            // Force computation on the RDD
            sc.runJob(b, (iter: Iterator[_]) =&amp;gt; {})


            // Write output file
            val (junk, timeW) = profile {
              b.saveAsTextFile(config.file)
            }
&lt;/PRE&gt;</description>
      <pubDate>Fri, 17 Mar 2017 01:56:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/FileAlreadyExistsException-when-calling-saveAsTextFile/m-p/161076#M57265</guid>
      <dc:creator>wbekker</dc:creator>
      <dc:date>2017-03-17T01:56:04Z</dc:date>
    </item>
    <item>
      <title>Re: FileAlreadyExistsException when calling saveAsTextFile</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/FileAlreadyExistsException-when-calling-saveAsTextFile/m-p/161077#M57266</link>
      <description>&lt;P&gt;I could run 'Runner' without errors in local mode; so the code itself is probably is not an issue.&lt;/P&gt;&lt;P&gt;Can you paste the exception stack (and possibly options) which causes this to surface ?&lt;/P&gt;&lt;P&gt;Also, not sure why you are doing the runJob - it will essentially be a noop in this case since data is not cached.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Mridul&lt;/P&gt;</description>
      <pubDate>Fri, 17 Mar 2017 03:13:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/FileAlreadyExistsException-when-calling-saveAsTextFile/m-p/161077#M57266</guid>
      <dc:creator>mmuralidharan</dc:creator>
      <dc:date>2017-03-17T03:13:27Z</dc:date>
    </item>
    <item>
      <title>Re: FileAlreadyExistsException when calling saveAsTextFile</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/FileAlreadyExistsException-when-calling-saveAsTextFile/m-p/161078#M57267</link>
      <description>&lt;P&gt;The exception seems to happen when nFiles is larger, like 1000, not when it's 10.&lt;/P&gt;&lt;PRE&gt;spark-submit --master yarn-cluster --class com.cisco.dfsio.test.Runner hdfs:///user/$USER/mantl-apps/benchmarking-apps/spark-test-dfsio-with-dependencies.jar --file data/testdfsio-write --nFiles 1000 --fSize 200000 -m write --log data/testdfsio-write/testHdfsIO-WRITE.log&lt;/PRE&gt;&lt;P&gt;btw: not my code.&lt;/P&gt;</description>
      <pubDate>Fri, 17 Mar 2017 03:18:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/FileAlreadyExistsException-when-calling-saveAsTextFile/m-p/161078#M57267</guid>
      <dc:creator>wbekker</dc:creator>
      <dc:date>2017-03-17T03:18:44Z</dc:date>
    </item>
    <item>
      <title>Re: FileAlreadyExistsException when calling saveAsTextFile</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/FileAlreadyExistsException-when-calling-saveAsTextFile/m-p/161079#M57268</link>
      <description>&lt;P&gt;solved by not having to many partitions for parallelize&lt;/P&gt;</description>
      <pubDate>Sat, 18 Mar 2017 04:02:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/FileAlreadyExistsException-when-calling-saveAsTextFile/m-p/161079#M57268</guid>
      <dc:creator>wbekker</dc:creator>
      <dc:date>2017-03-18T04:02:18Z</dc:date>
    </item>
  </channel>
</rss>

