Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to write a file (text format) in HDFS using R

Highlighted

How to write a file (text format) in HDFS using R

I was able to read a text file from HDFS.

11128-read-hdfs-file.png

But when I try to write dummy data to HDFS it seems the data is stored in sequence format instead text format

11127-write-hdfs-file.png

I'm doing something wrong or there is not a direct way to write a file in text format? A workaround could be create a local file and then use the command hdfs.put to upload the file to HDFS.

Thanks in advanced.

4 REPLIES 4
Highlighted

Re: How to write a file (text format) in HDFS using R

Super Guru

@Javier Teixeira Quevedo

rhdfs package has a put function. You should be able to simply write the file using put function. See the following link (accepted answer):

https://community.hortonworks.com/questions/36583/how-to-save-data-in-hdfs-using-r.html. here is how you should do it:

localData <-system.file(file.path("unitTestData", "AirlineDemo1kNoMissing.csv"),package="rhdfs")

hdfs.mkdir("/test/airline")

hdfs.put(localData, "/test/airline/AirlineDemo1kNoMissing.csv")
Highlighted

Re: How to write a file (text format) in HDFS using R

Thanks for the quick response @mqureshi

The command hdfs.put is for upload a local file to HDFS, but I need to store directly the data without store it in a local file. If there is not other way I will have to use that approach.

Highlighted

Re: How to write a file (text format) in HDFS using R

@Javier Teixeira Quevedo

usage :

hdfs.write(object,con,hsync=FALSE)

arguments:

object: The R object to be written to disk.

con: An open HDFS connection returned by ‘hdfs.file’

hsync: If TRUE, the file will be synched after writing

details:

The functions can be used to read and write files both on the
local filesystem and the HDFS. If the object is a raw vector, it
is written directly to the ‘con’ object, otherwise it is
serialized and the bytes written to the ‘con’. No prefix (for
example, length of bytes) are written and it is up to the user to
handle this. ‘hdfs.seek’ seeks to the position ‘n’. It must be
positive. ‘hdfs.tell’ returns the current location of the file
pointer.

code:

data <- "hello world"

modelfile <- hdfs.file("test.txt", "w")

data1 <- toJSON(data)

data2 <- charToRaw(data1)

hdfs.write(data2,modelfile)

hdfs.close(modelfile)

description:

you have to write data as raw vector to modelfile object .

Highlighted

Re: How to write a file (text format) in HDFS using R

New Contributor

@midhunxavier I have used above code for my requirement, but having below issue..

Out Put data format: 

["TER0626974_achieved","TER0630327_achieved","TER0630520_achieved","TER0537124_achieved","TER0404705_achieved"]

 

Issue: Now the issue is writing and reading this data from Hive.

We are able to insert this result into hive. But when try to read, getting below error.

> archive_data <- dbGetQuery(hivecon, "SELECT * from Table")
Error in .jcall(rp, "I", "fetch", stride, block) :
  org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start token not found where expected
 
I guess this is because ofJSON should start with { and not with array ([)?
But i am not sure how to change square brackets to {.
 
Appreciate your support in resolving this issue.
Thanks in Advance,
Don't have an account?
Coming from Hortonworks? Activate your account here