Support Questions

Find answers, ask questions, and share your expertise

Saving a list in HDFS with headers using R

avatar
Expert Contributor

I have this data I fetched from an API using R in a list and it looks like this:

                 productSku               productName productCategory       date
1                 (not set)                 (not set)       (not set) 2015-12-28
2                         1                         1               1 2015-12-28
3                         F                         F               F 2015-12-28
4                         I                         I               I 2015-12-28
5     IN1309MTODREBLA-112-M      Fantasy Garden Dress          sample 2016-02-09

I am using the package rhdfs and Now, I am storing it in HDFS as:

hdfs.write(get(fileName), modelfile)

But when I try to read it back

getLastDataImportDate = function(){
    hdfs.init()
    f = hdfs.file("/user/rstudio/gaDataEcommerce4","r")
    m = hdfs.read(f)
    m1 <- m
    mnull <- m == as.raw(0)
    m1[mnull] <- as.raw(20)
    c <- rawToChar(m1)
    li <- as.list(c)
    print(li)

What I get back on reading data looks like this:

[1] "X\n\024\024\024\002\024\003\002\003\024\002\003\024\024\024\003\023\024\024\024\t\024\024\024\020\024\024\002\xd9\024\004\024\t\024\024\024\t(not set)\024\004\024\t\024\024\024\0011\024\004\024\t\024\024\024\001F\024\004\024\t\024\024\024\001I\024\004\024\t\024\024\024\025IN1309MTODREBLA-112-M\024\004\024\t\024\024\024\025IN1309MTODREBLA-112-S\024\004\024\t\024\024\024\025IN1309MTODREBLA-112-S\024\004\024\t\024\024\024\025IN1309MTODREBLA-112-S\024\004\024\t\024\024\024\025IN1309MTODREBLA-112-S\024\004\024\t\024\024\024\025IN1309MTODREBLA-112-S\024\004\024\t\024\024\024\025IN1315MTODREPNK-114-S\024\004\024\t\024\024\024\025IN1317MTOJKTBLA-104-L\024\004\024\t\024\024\024\026IN1319MTODREGRN-110-XL\024\004\024\t\024\024\024\026IN1322MTPJKTRED-143-20\024\004\024\t\024\024\024\026IN1326MTODREPNK-117-XS\024\004\024\t\024\024\024\027IN1326MTODREPNK-117-XXL\024\004\024\t\024\024\024\023IN1329AVVBAGBLA-135\024\004\024\t\024\024\024\023IN1329AVVBAGBLA-135\024\004\024\t\024\024\024\023IN1329AVVBAGRED-138\024\004\024\t\024\024\024\023IN1329AVVBAGRED-138\024\004\024\t\024\024\024\023IN1329AVVBAGRED-138\024\004\024\t\024\024\024\025IN1329MTOTOPWHT-108-M\024\004\024\t\024\024\024\023IN1331AVVBAGRED-105\024\004\024\t\024\024\024\023IN1331AVVBAGRED-105\024\004\024\t\024\024\024\023IN1332AVVBAGBLU-118\024\004\024\t\024\024\024\023IN1332AVVBAGBLU-118\024\004\024\t\024\024\024\023IN1332AVVBAGPRL-168\024\004\024\t\024\024\024\023IN1332AVVBAGPRL-168\024\004\024\t\024\024\024\023IN1332AVVBAGRED-152\024\004\024\t\024\024\024\023IN1332AVVBAGSLR-130\024\004\024\t\024\024\024\023IN1332AVVBAGSLR-130\024\004\024\t\024\024\024\026IN1332MTPTOPBLA-137-20\024\004\024\t\024\024\024\023IN1335AVVBAGBLA-112\024\004\024\t\024\024\024\023IN1335AVVBAGBLA-137\024\004\024\t\024\024\024\023IN1336AVVBAGBLA-146\024\004\024\t\024\024\024\023IN1336AVVBAGBLA-146\024\004\024\t\024\024\024\023IN1336AVVBAGBLA-146\024\004\024\t\024\024\024\023IN1336AVVBAGBLA-146\024\004\024\t\024\024\024\023IN1336AVVBAGBLA-146\024\004\024\t\024\024\024\023IN1336AVVBAGSLR-143\024\004\024\t\024\024\024\023IN1336AVVBAGSLR-143\024\004\024\t\024\024\024\025IN1336MTODREBLA-179-S\024\004\024\t\024\024\024\025IN1336MTODREBLA-179-S\024\004\024\t\024\024\024\025IN1336MTODREBLA-179-S\024\004\024\t\024\024\024\025IN1336MTOTOPPNK-187-S\024\004\024\t\024\024\024\025IN1337MTODREBLA-101-S\024\004\024\t\024\024\024\025IN1337MTODREMLT-117-M\024\004\024\t\024\024\024\025IN1337MTODREMLT-117-M\024\004\024\t\024\024\024\026IN1337MTODREMLT-117-XS\024\004\024\t\024\024\024\027IN1337MTODREMLT-117-XXL\024\004\024\t\024\024\024\027IN1337MTODREMLT-117-XXL\024\004\024\t\024\024\024\027IN1337MTODREMLT-117-XXL\024\0

It does not really look like a list to me?

Where are the column names here in this data? I believe it is only the data without header info.

Also, I need to get maximum date from date column that I had when I was writing data, how do I access it?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Simran Kaur

When you convert using as.list then your data will look like the above. Whats your problem here? how do you want the data to look like? What are you trying to do here?

View solution in original post

3 REPLIES 3

avatar
Super Guru

@sameer lail

Do you have some sample dataset that I can use to reproduce this?

avatar
Expert Contributor

@Simran Kaur

When you convert using as.list then your data will look like the above. Whats your problem here? how do you want the data to look like? What are you trying to do here?

avatar
Expert Contributor

You can try as below

li <- read.table(textConnection(c), sep = ",");