Support Questions

Find answers, ask questions, and share your expertise

Apache SPARK - Overwrite data file

avatar
Rising Star

Hi experts, How can I overwrite an existing file by a new one (data update). Imagine that I've this:

  1. result.map(pair => pair.swap).sortByKey(true).saveAsTextFile("FILE/results")

And Imagine that I want to do this:

  1. test.map(pair => pair.swap).sortByKey(false).saveAsTextFile("FILE/results")

How can I overwrite the results of the var result to the results of the val test in same directory?

1 ACCEPTED SOLUTION

avatar

RDD's saveAsTextFile does not give us the opportunity to do that (DataFrame's have "save modes" for things like append/overwrite/ignore). You'll have to control this prior before (maybe delete or rename existing data) or afterwards (write the RDD as a diff dir and then swap it out).

View solution in original post

1 REPLY 1

avatar

RDD's saveAsTextFile does not give us the opportunity to do that (DataFrame's have "save modes" for things like append/overwrite/ignore). You'll have to control this prior before (maybe delete or rename existing data) or afterwards (write the RDD as a diff dir and then swap it out).