- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Write CSV in HDFS
- Labels:
-
Apache Hadoop
Created ‎06-06-2016 06:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I wrote the following function to write data in HDFS using R and am using rhdfs.
writeToHDFS <-function(fileName){
hdfs.init()
modelfile <- hdfs.file(fileName,"w")
hdfs.write(get(fileName), modelfile)
hdfs.close(modelfile)}
How do I modify it store this data in CSV format instead?I have tried using pipe
but since it is deprecated, I would like a way to write CSV files through hdfs.write functions.
I tried this:
modelfile <- hdfs.file(paste(fileName,"csv", sep="."),"w")
but I do not think it creates a valid CSV but only appends the extension for it.
Created ‎06-09-2016 03:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is not stupid what you did. CSV is a file format, not a data structure in R. What you could is to create a dataframe with a single column with all values separated by comma then use hdfs write to output that as a file with extension csv. Another option is to write map-reduce with R and streaming API and set the output to be csv.
If any of my responses were helpful, please don't forget to vote them.
Created ‎06-06-2016 09:21 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What data format is the file that you assign to modelfile dataframe? If it is not csv then you would need to convert it to csv before writing it to HDFS. If it is csv then check this Q/A: https://community.hortonworks.com/questions/36583/how-to-save-data-in-hdfs-using-r.html
Created ‎06-07-2016 04:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What I ended up doing is pretty stupid. I used write.csv and wrote it locally and then usd hdfs.put to move it to hdfs. Data type of data is list. How do I convert it to csv before writing it in hdfs using hdfs.write ? @Constantin Stanca . Thank you so much for your response though. I hope to hear back on this.
Created ‎06-07-2016 04:51 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The file that gets written in hdfs with hdfs.write without specifying the file type has no extension at all. So, I actually needed to know what is the default format the hdfs.write would write in ?How do I specify the file type I would like to store the data in? @Constantin Stanca
Created ‎06-07-2016 04:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are using rhive for moving files to HDFS in R . Do we have any additional advantages with rhdfs?
just I'm asking to check and to implement in my project as well.
Created ‎06-07-2016 05:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Divakar Annapureddy: I am using rhdfs but no major added advantages of using rhive. It looks like rhdfs only with all its functions. It is a little more polished though and offers a bit more functionality than rhdfs.
Created ‎06-09-2016 03:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is not stupid what you did. CSV is a file format, not a data structure in R. What you could is to create a dataframe with a single column with all values separated by comma then use hdfs write to output that as a file with extension csv. Another option is to write map-reduce with R and streaming API and set the output to be csv.
If any of my responses were helpful, please don't forget to vote them.
