Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Write CSV in HDFS

avatar
Expert Contributor

I wrote the following function to write data in HDFS using R and am using rhdfs.

writeToHDFS <-function(fileName){
   hdfs.init()
   modelfile <- hdfs.file(fileName,"w")
   hdfs.write(get(fileName), modelfile)
   hdfs.close(modelfile)}

How do I modify it store this data in CSV format instead?I have tried using pipe but since it is deprecated, I would like a way to write CSV files through hdfs.write functions.

I tried this:

   modelfile <- hdfs.file(paste(fileName,"csv", sep="."),"w")

but I do not think it creates a valid CSV but only appends the extension for it.

1 ACCEPTED SOLUTION

avatar
Super Guru

@sameer lail

It is not stupid what you did. CSV is a file format, not a data structure in R. What you could is to create a dataframe with a single column with all values separated by comma then use hdfs write to output that as a file with extension csv. Another option is to write map-reduce with R and streaming API and set the output to be csv.

If any of my responses were helpful, please don't forget to vote them.

View solution in original post

6 REPLIES 6

avatar
Super Guru

@sameer lail

What data format is the file that you assign to modelfile dataframe? If it is not csv then you would need to convert it to csv before writing it to HDFS. If it is csv then check this Q/A: https://community.hortonworks.com/questions/36583/how-to-save-data-in-hdfs-using-r.html

avatar
Expert Contributor

What I ended up doing is pretty stupid. I used write.csv and wrote it locally and then usd hdfs.put to move it to hdfs. Data type of data is list. How do I convert it to csv before writing it in hdfs using hdfs.write ? @Constantin Stanca . Thank you so much for your response though. I hope to hear back on this.

avatar
Expert Contributor

The file that gets written in hdfs with hdfs.write without specifying the file type has no extension at all. So, I actually needed to know what is the default format the hdfs.write would write in ?How do I specify the file type I would like to store the data in? @Constantin Stanca

avatar

@sameer lail

We are using rhive for moving files to HDFS in R . Do we have any additional advantages with rhdfs?

just I'm asking to check and to implement in my project as well.

avatar
Expert Contributor

@Divakar Annapureddy: I am using rhdfs but no major added advantages of using rhive. It looks like rhdfs only with all its functions. It is a little more polished though and offers a bit more functionality than rhdfs.

avatar
Super Guru

@sameer lail

It is not stupid what you did. CSV is a file format, not a data structure in R. What you could is to create a dataframe with a single column with all values separated by comma then use hdfs write to output that as a file with extension csv. Another option is to write map-reduce with R and streaming API and set the output to be csv.

If any of my responses were helpful, please don't forget to vote them.