Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Apache SPARK - Overwrite data file

avatar
Rising Star

Hi experts, How can I overwrite an existing file by a new one (data update). Imagine that I've this:

  1. result.map(pair => pair.swap).sortByKey(true).saveAsTextFile("FILE/results")

And Imagine that I want to do this:

  1. test.map(pair => pair.swap).sortByKey(false).saveAsTextFile("FILE/results")

How can I overwrite the results of the var result to the results of the val test in same directory?

1 ACCEPTED SOLUTION

avatar

RDD's saveAsTextFile does not give us the opportunity to do that (DataFrame's have "save modes" for things like append/overwrite/ignore). You'll have to control this prior before (maybe delete or rename existing data) or afterwards (write the RDD as a diff dir and then swap it out).

View solution in original post

1 REPLY 1

avatar

RDD's saveAsTextFile does not give us the opportunity to do that (DataFrame's have "save modes" for things like append/overwrite/ignore). You'll have to control this prior before (maybe delete or rename existing data) or afterwards (write the RDD as a diff dir and then swap it out).