Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to save data in HDFS using R

Highlighted

How to save data in HDFS using R

Expert Contributor

How do I save data in HDFS using R?

Let's say I have all my data that I need to save in HDFS in a variable in R and I would like to save it in csv format.

R has a function write.csv() to save it in CSV format but how do I save it in HDFS?

I guess rhdfs is the way to go?But how exactly?

4 REPLIES 4

Re: How to save data in HDFS using R

@sameer lail

Yes, we can use rhdfs for performing HDFS related operations from R.

R objects can be serialized to HDFS via the function: hdfs.write. An example is shown below:

model <- lm(...)

modelfilename <- "my_smart_unique_name"

modelfile <- hdfs.file(modelfilename, "w")

hdfs.write(model, modelfile)

hdfs.close(modelfile)

For more details, please refer to below links:

https://github.com/RevolutionAnalytics/RHadoop/wiki/user%3Erhdfs%3EHome

http://www.slideshare.net/Hadoop_Summit/enabling-r-on-hadoop

Hope this helps.

Thanks and Regards,

Sindhu

Re: How to save data in HDFS using R

Expert Contributor

@Sindhu: Thank you. What is the file ttype here that data gets written in? The file seems to have no extension. HOw do I specify the file extension/format I would like to store my data in ?

Re: How to save data in HDFS using R

Contributor

Another option, if you have your data in Hive, is to use the RODBC package. Enables you to query Hive tables and pull back data into a dataframe etc.

Re: How to save data in HDFS using R

@ sameer lail

Use Hadoop Streaming and write your mapper in R and the reducer in Java for example. The output would be to HDFS. Look at the "Streaming" section of this example: http://hortonworks.com/blog/using-r-and-other-non-java-languages-in-mapreduce-and-hive/