How do I save data in HDFS using R?
Let's say I have all my data that I need to save in HDFS in a variable in R and I would like to save it in csv format.
R has a function write.csv() to save it in CSV format but how do I save it in HDFS?
I guess rhdfs is the way to go?But how exactly?
Yes, we can use rhdfs for performing HDFS related operations from R.
R objects can be serialized to HDFS via the function: hdfs.write. An example is shown below:
model <- lm(...)
modelfilename <- "my_smart_unique_name"
modelfile <- hdfs.file(modelfilename, "w")
For more details, please refer to below links:
Hope this helps.
Thanks and Regards,
Another option, if you have your data in Hive, is to use the RODBC package. Enables you to query Hive tables and pull back data into a dataframe etc.
Use Hadoop Streaming and write your mapper in R and the reducer in Java for example. The output would be to HDFS. Look at the "Streaming" section of this example: http://hortonworks.com/blog/using-r-and-other-non-java-languages-in-mapreduce-and-hive/