Created 05-01-2018 12:08 AM
When Spark uses Hadoop writer to write part-file (using saveAsTextFile()), "part-NNNNN" is the general format it saves the file in. How can I retrieve this suffix "NNNNN" in Spark at runtime?
Ps. I do not want to list the files and then retrieve the suffix.
Created 05-01-2018 08:18 PM
Any suggestions?
Created 05-01-2018 09:59 PM
What exactly is your need?
I ask this becouse if you simple want to read saved file is only necessary that you set the folder and all content will be read.
sc.textFile("foldername/*")
So, if what you want is write one unique file, from a previous processing of a DataFrame then you can do this using the "df.repartition(1).saveAsTextFile('HDFSFolder/FileName')" and so, only one file "part-00000" will be generated.
If you are using a library like DataBricks you can do so:
df.write.format("csv").save("/HDFSFolder/FileName.csv")
That's it?
Created 05-01-2018 10:23 PM
I'm not trying to read it, I just want to know the complete name of the part-file at runtime in Spark, once a reducer saves it.
Created 05-02-2018 06:36 PM
Aways that you perform a save opperation the files will be created acording the number of partition of you DF, and this process generate files names same "part-xxxxx", so, this is the complete file name.
The file name never will be different this. The variable is how many files will be generated.
So sorry if I understand you desire.