Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Apache SPARK - Overwrite data file

Solved Go to solution

Apache SPARK - Overwrite data file

Explorer

Hi experts, How can I overwrite an existing file by a new one (data update). Imagine that I've this:

  1. result.map(pair => pair.swap).sortByKey(true).saveAsTextFile("FILE/results")

And Imagine that I want to do this:

  1. test.map(pair => pair.swap).sortByKey(false).saveAsTextFile("FILE/results")

How can I overwrite the results of the var result to the results of the val test in same directory?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Apache SPARK - Overwrite data file

RDD's saveAsTextFile does not give us the opportunity to do that (DataFrame's have "save modes" for things like append/overwrite/ignore). You'll have to control this prior before (maybe delete or rename existing data) or afterwards (write the RDD as a diff dir and then swap it out).

View solution in original post

1 REPLY 1
Highlighted

Re: Apache SPARK - Overwrite data file

RDD's saveAsTextFile does not give us the opportunity to do that (DataFrame's have "save modes" for things like append/overwrite/ignore). You'll have to control this prior before (maybe delete or rename existing data) or afterwards (write the RDD as a diff dir and then swap it out).

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here