Cloudera Community

Support Questions

Find answers, ask questions, and share your expertise

Advanced Search

New Member

When Spark uses Hadoop writer to write part-file (using saveAsTextFile()), "part-NNNNN" is the general format it saves the file in. How can I retrieve this suffix "NNNNN" in Spark at runtime?

Ps. I do not want to list the files and then retrieve the suffix.

5,719 Views

4 REPLIES 4

New Member

Any suggestions?

5,607 Views

New Member

Hi @Prudhvi Rao Shedimbi,

What exactly is your need?

I ask this becouse if you simple want to read saved file is only necessary that you set the folder and all content will be read.

sc.textFile("foldername/*")

So, if what you want is write one unique file, from a previous processing of a DataFrame then you can do this using the "df.repartition(1).saveAsTextFile('HDFSFolder/FileName')" and so, only one file "part-00000" will be generated.

If you are using a library like DataBricks you can do so:

df.write.format("csv").save("/HDFSFolder/FileName.csv")

That's it?

5,607 Views

New Member

I'm not trying to read it, I just want to know the complete name of the part-file at runtime in Spark, once a reducer saves it.

5,607 Views

New Member

Aways that you perform a save opperation the files will be created acording the number of partition of you DF, and this process generate files names same "part-xxxxx", so, this is the complete file name.

The file name never will be different this. The variable is how many files will be generated.

So sorry if I understand you desire.

5,607 Views

Announcements

Community Announcements

December 2025 Community Highlights

Community Announcements

Announcing the Launch of Cloudera Community Blogs

Community Announcements

October / November 2025 Community Highlights

What's New @ Cloudera

Announcing Cloudera Streaming Analytics - Kubernetes Operato...

What's New @ Cloudera

Announcing Cloudera Streams Messaging - Kubernetes Operator ...