Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Unable to read csv file in R

Re: Unable to read csv file in R

In the same machine and user (zeppelin@hdfs-hadoop-p0z7) run the below and attach the output:

hdfs dfs -ls /tmp/updated.csv

and

$ R
> a<-read.csv("/tmp/updated.csv")

In Zeppelin - restart spark2 interpreter, re-run

%spark2.r
a<-read.csv("/tmp/test.csv")
print(a)

and send me the application log for that job

Re: Unable to read csv file in R

New Contributor

Please find below

smittapally@hdfs-hadoop-p0z7:~$ sudo su - zeppelin
zeppelin@hdfs-hadoop-p0z7:~$ hdfs dfs -ls /tmp/updated.csv
-rw-r--r--   3 hdfs hdfs    1133735 2017-06-07 10:05 /tmp/updated.csv
zeppelin@hdfs-hadoop-p0z7:~$ R


R version 3.4.0 (2017-04-21) -- "You Stupid Darkness"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)


R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.


  Natural language support but running in an English locale


R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.


Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.


> a<-read.csv("/tmp/updated.csv")
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file '/tmp/updated.csv': No such file or directory
>


latestlog.txt

Re: Unable to read csv file in R

New Contributor

Any luck Daniel?

Re: Unable to read csv file in R

Hi @sysadmin CreditVidya

I was away yesterday. I have just done further test.

- on hdfs-hadoop-p0z7 locate your updated.csv file in /tmp folder - not HDFS but OS

- make sure the file is with READ permissions

- then do:

$ su - zeppelin
$ R
> a<-read.csv("/tmp/updated.csv")

Re: Unable to read csv file in R

New Contributor

Placed csv in /tmp of OS and able to read it.

hdfs@hdfs-hadoop-p0z7:~$ R
R version 3.4.0 (2017-04-21) -- "You Stupid Darkness"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
  Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
[Previously saved workspace restored]
> a<-read.csv("/tmp/updated.csv")
> head(a)
   domain_name
1   0-mail.com
2      0sg.net
3   11mail.com
4      123.com
5 123india.com
6   123mail.cl
>

Re: Unable to read csv file in R

@sysadmin CreditVidya

put the file on zeppelin node in OS and try through %spark2.r

Re: Unable to read csv file in R

New Contributor

Did that, able to get the data.

16193-data-at-local.png

Re: Unable to read csv file in R

@sysadmin CreditVidya

It is good to hear that finally works for you

Highlighted

Re: Unable to read csv file in R

New Contributor

I couldn't get it, how the data replicated between all the datanodes When I am trying to read csv files? It is not HDFS replication right.

Re: Unable to read csv file in R

To read the file it needs to be on OS rather than HDFS.

I hope that clarifies the process.

Don't have an account?
Coming from Hortonworks? Activate your account here