Created 06-06-2016 06:07 AM
I wrote the following function to write data in HDFS using R and am using rhdfs.
writeToHDFS <-function(fileName){
hdfs.init()
modelfile <- hdfs.file(fileName,"w")
hdfs.write(get(fileName), modelfile)
hdfs.close(modelfile)}
How do I modify it store this data in CSV format instead?I have tried using pipe
but since it is deprecated, I would like a way to write CSV files through hdfs.write functions.
I tried this:
modelfile <- hdfs.file(paste(fileName,"csv", sep="."),"w")
but I do not think it creates a valid CSV but only appends the extension for it.
Created 06-09-2016 03:02 AM
It is not stupid what you did. CSV is a file format, not a data structure in R. What you could is to create a dataframe with a single column with all values separated by comma then use hdfs write to output that as a file with extension csv. Another option is to write map-reduce with R and streaming API and set the output to be csv.
If any of my responses were helpful, please don't forget to vote them.
Created 06-06-2016 09:21 PM
What data format is the file that you assign to modelfile dataframe? If it is not csv then you would need to convert it to csv before writing it to HDFS. If it is csv then check this Q/A: https://community.hortonworks.com/questions/36583/how-to-save-data-in-hdfs-using-r.html
Created 06-07-2016 04:49 AM
What I ended up doing is pretty stupid. I used write.csv and wrote it locally and then usd hdfs.put to move it to hdfs. Data type of data is list. How do I convert it to csv before writing it in hdfs using hdfs.write ? @Constantin Stanca . Thank you so much for your response though. I hope to hear back on this.
Created 06-07-2016 04:51 AM
The file that gets written in hdfs with hdfs.write without specifying the file type has no extension at all. So, I actually needed to know what is the default format the hdfs.write would write in ?How do I specify the file type I would like to store the data in? @Constantin Stanca
Created 06-07-2016 04:58 AM
We are using rhive for moving files to HDFS in R . Do we have any additional advantages with rhdfs?
just I'm asking to check and to implement in my project as well.
Created 06-07-2016 05:07 AM
@Divakar Annapureddy: I am using rhdfs but no major added advantages of using rhive. It looks like rhdfs only with all its functions. It is a little more polished though and offers a bit more functionality than rhdfs.
Created 06-09-2016 03:02 AM
It is not stupid what you did. CSV is a file format, not a data structure in R. What you could is to create a dataframe with a single column with all values separated by comma then use hdfs write to output that as a file with extension csv. Another option is to write map-reduce with R and streaming API and set the output to be csv.
If any of my responses were helpful, please don't forget to vote them.