Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

FileAlreadyExistsException when calling saveAsTextFile

Solved Go to solution
Highlighted

FileAlreadyExistsException when calling saveAsTextFile

When running this small piece of Scala code I get a "org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://xxx.eu-west-1.compute.internal:8020/user/cloudbreak/data/testdfsio-write".

Below the piece of code where the `saveAsTextFile` is executed. The directory does not exist before running this script. Why is this FileAlreadyExistsException being raised?

            // Create a Range and parallelize it, on nFiles partitions
            // The idea is to have a small RDD partitioned on a given number of workers
            // then each worker will generate data to write
            val a = sc.parallelize(1 until config.nFiles + 1, config.nFiles)


            val b = a.map(i => {
              // generate an array of Byte (8 bit), with dimension fSize
              // fill it up with "0" chars, and make it a string for it to be saved as text
              // TODO: this approach can still cause memory problems in the executor if the array is too big.
              val x = Array.ofDim[Byte](fSizeBV.value).map(x => "0").mkString("")
              x
            })


            // Force computation on the RDD
            sc.runJob(b, (iter: Iterator[_]) => {})


            // Write output file
            val (junk, timeW) = profile {
              b.saveAsTextFile(config.file)
            }
1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: FileAlreadyExistsException when calling saveAsTextFile

New Contributor

I could run 'Runner' without errors in local mode; so the code itself is probably is not an issue.

Can you paste the exception stack (and possibly options) which causes this to surface ?

Also, not sure why you are doing the runJob - it will essentially be a noop in this case since data is not cached.

Regards,

Mridul

View solution in original post

3 REPLIES 3
Highlighted

Re: FileAlreadyExistsException when calling saveAsTextFile

New Contributor

I could run 'Runner' without errors in local mode; so the code itself is probably is not an issue.

Can you paste the exception stack (and possibly options) which causes this to surface ?

Also, not sure why you are doing the runJob - it will essentially be a noop in this case since data is not cached.

Regards,

Mridul

View solution in original post

Highlighted

Re: FileAlreadyExistsException when calling saveAsTextFile

The exception seems to happen when nFiles is larger, like 1000, not when it's 10.

spark-submit --master yarn-cluster --class com.cisco.dfsio.test.Runner hdfs:///user/$USER/mantl-apps/benchmarking-apps/spark-test-dfsio-with-dependencies.jar --file data/testdfsio-write --nFiles 1000 --fSize 200000 -m write --log data/testdfsio-write/testHdfsIO-WRITE.log

btw: not my code.

Highlighted

Re: FileAlreadyExistsException when calling saveAsTextFile

solved by not having to many partitions for parallelize

Don't have an account?
Coming from Hortonworks? Activate your account here