Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to write a file (text format) in HDFS using R

How to write a file (text format) in HDFS using R

New Contributor

I was able to read a text file from HDFS.

11128-read-hdfs-file.png

But when I try to write dummy data to HDFS it seems the data is stored in sequence format instead text format

11127-write-hdfs-file.png

I'm doing something wrong or there is not a direct way to write a file in text format? A workaround could be create a local file and then use the command hdfs.put to upload the file to HDFS.

Thanks in advanced.

3 REPLIES 3

Re: How to write a file (text format) in HDFS using R

Super Guru

@Javier Teixeira Quevedo

rhdfs package has a put function. You should be able to simply write the file using put function. See the following link (accepted answer):

https://community.hortonworks.com/questions/36583/how-to-save-data-in-hdfs-using-r.html. here is how you should do it:

localData <-system.file(file.path("unitTestData", "AirlineDemo1kNoMissing.csv"),package="rhdfs")

hdfs.mkdir("/test/airline")

hdfs.put(localData, "/test/airline/AirlineDemo1kNoMissing.csv")

Re: How to write a file (text format) in HDFS using R

New Contributor

Thanks for the quick response @mqureshi

The command hdfs.put is for upload a local file to HDFS, but I need to store directly the data without store it in a local file. If there is not other way I will have to use that approach.

Highlighted

Re: How to write a file (text format) in HDFS using R

New Contributor

@Javier Teixeira Quevedo

usage :

hdfs.write(object,con,hsync=FALSE)

arguments:

object: The R object to be written to disk.

con: An open HDFS connection returned by ‘hdfs.file’

hsync: If TRUE, the file will be synched after writing

details:

The functions can be used to read and write files both on the
local filesystem and the HDFS. If the object is a raw vector, it
is written directly to the ‘con’ object, otherwise it is
serialized and the bytes written to the ‘con’. No prefix (for
example, length of bytes) are written and it is up to the user to
handle this. ‘hdfs.seek’ seeks to the position ‘n’. It must be
positive. ‘hdfs.tell’ returns the current location of the file
pointer.

code:

data <- "hello world"

modelfile <- hdfs.file("test.txt", "w")

data1 <- toJSON(data)

data2 <- charToRaw(data1)

hdfs.write(data2,modelfile)

hdfs.close(modelfile)

description:

you have to write data as raw vector to modelfile object .