Spark - Get part-file suffix (part-NNNNN)

When Spark uses Hadoop writer to write part-file (using saveAsTextFile()), "part-NNNNN" is the general format it saves the file in. How can I retrieve this suffix "NNNNN" in Spark at runtime?

Ps. I do not want to list the files and then retrieve the suffix.


Any suggestions?


Hi @Prudhvi Rao Shedimbi,

What exactly is your need?

I ask this becouse if you simple want to read saved file is only necessary that you set the folder and all content will be read.


So, if what you want is write one unique file, from a previous processing of a DataFrame then you can do this using the "df.repartition(1).saveAsTextFile('HDFSFolder/FileName')" and so, only one file "part-00000" will be generated.

If you are using a library like DataBricks you can do so:


That's it?

I'm not trying to read it, I just want to know the complete name of the part-file at runtime in Spark, once a reducer saves it.


Aways that you perform a save opperation the files will be created acording the number of partition of you DF, and this process generate files names same "part-xxxxx", so, this is the complete file name.

The file name never will be different this. The variable is how many files will be generated.

So sorry if I understand you desire.

