Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to save each part file of single object into different/separate directories.

How to save each part file of single object into different/separate directories.

New Contributor

#I have saved one RDD with 4 part files initially underneath of one directory.

But I have a use case That I need do separate each part file of a particular data set should save in different directories.

2 REPLIES 2
Highlighted

Re: How to save each part file of single object into different/separate directories.

@TIRUPATI CHELLARAPU

I think this should be possible by using rdd.foreachPartition, then you could hopefully store each partition in separate file/directory.

Similar solution is described here:

https://stackoverflow.com/questions/30338213/writing-rdd-partitions-to-individual-parquet-files-in-i...

as a simpler alternative they also suggest

df.write.partitionBy("year", "month", "day").parquet("/path/to/output")

which will create directory structure for the partitioned columns of the dataframe.

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

Re: How to save each part file of single object into different/separate directories.

New Contributor

Thank you for your information. But I need like below

I already have a directory contains four part files

ex:

merchant_table (main dir)

p00000

p00001

p00002

p00003

Again, I need to save the above one as separate four directories and one part file for that like below

merchant_table1 ( dir1)

p00000

merchant_table1 ( dir2)

p00001

merchant_table1 ( dir3)

p00002

merchant_table1 ( dir4)

p00003

Don't have an account?
Coming from Hortonworks? Activate your account here