Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Unable to read csv file in R

Re: Unable to read csv file in R

New Contributor
hdfs@hdfs-hadoop-tswv:~$ cat /etc/issue
Ubuntu 14.04.5 LTS \n \l 
hdfs@hdfs-hadoop-tswv:~$

updated.zip

zeppelin-interpreter-spark2-spark-zeppelin-hdfs-ha.zip

Re: Unable to read csv file in R

Hi @sysadmin CreditVidya

I have tried yours and it also works fine.

16019-updated.png

If you run the same from R CLI - does it work fine?

Also attach the application log for application_1496678448974_0010.

Re: Unable to read csv file in R

New Contributor

I tried like this, still it is saying the same thing.

17/06/06 09:12:23 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
 Welcome to
    ____              __
   / __/__  ___ _____/ /__
  _\ \/ _ \/ _ `/ __/  '_/
 /___/ .__/\_,_/_/ /_/\_\   version  1.6.3
    /_/
 Spark context is available as sc, SQL context is available as sqlContext
> sc.version
Error: object 'sc.version' not found
> sc.version()
Error: could not find function "sc.version"
> a<-read.csv("/tmp/updated.csv")
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file '/tmp/updated.csv': No such file or directory
> a<-read.csv("hdfs:///tmp/updated.csv")
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'hdfs:///tmp/updated.csv': No such file or directory
>

eventlogs-application-1496038265291-0008-1.zip

eventlogs-application-1496038265291-0012-1.zip

Re: Unable to read csv file in R

New Contributor

Tried in spark2 also.

17/06/06 09:44:48 WARN Utils: Service 'SparkUI' could not bind on port 4050. Attempting port 4051.
 Welcome to
    ____              __
   / __/__  ___ _____/ /__
  _\ \/ _ \/ _ `/ __/  '_/
 /___/ .__/\_,_/_/ /_/\_\   version  2.1.0.2.6.0.3-8
    /_/
 Spark context is available as sc, SQL context is available as sqlContext
During startup - Warning messages:
1: 'SparkR::sparkR.init' is deprecated.
Use 'sparkR.session' instead.
See help("Deprecated")
2: 'SparkR::sparkRSQL.init' is deprecated.
Use 'sparkR.session' instead.
See help("Deprecated")
> sc
Java ref type org.apache.spark.api.java.JavaSparkContext id 0
> a<-read.csv(
+
> a<-read.csv("/tmp/updated.csv")
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file '/tmp/updated.csv': No such file or directory
>

Re: Unable to read csv file in R

Hi @sysadmin CreditVidya

Can you run the following:

$ hdfs dfs -ls hdfs://<Active_NN>:8020/tmp/update.csv

Send me the output.

Besides, run:

$ R
> a<-read.csv("/tmp/updated.csv")

How did you extract the application log? Asking as it does not seem to be in a standard format.

Re: Unable to read csv file in R

New Contributor
root@hdfs-hadoop-tswv:/var/log/hadoop/hdfs# R
R version 3.3.3 (2017-03-06) -- "Another Canoe"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
  Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> a<-read.csv("/tmp/updated.csv")
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file '/tmp/updated.csv': No such file or directory
>

Re: Unable to read csv file in R

New Contributor

Where can I find this

How did you extract the application log? Asking as it does not seem to be in a standard format.
Highlighted

Re: Unable to read csv file in R

@sysadmin CreditVidya

You can get the application log doing:

$ yarn logs -applicationId <your_applicatuon_id_here>

Having asked to test

$ hdfs dfs -ls hdfs://<Active_NN>:8020/tmp/update.csv

run it from the zeppelin machine

I did noticed you ran it as

smittapally@hdfs-hadoop-mr2w:~$ hdfs dfs -ls hdfs://hdfs-hadoop-88hh.c.creditvidya-152512.internal:8020/tmp/updated.csv

Try the same from

root@hdfs-hadoop-tswv

Re: Unable to read csv file in R

New Contributor
hdfs@hdfs-hadoop-p0z7:~$ hdfs dfs -ls hdfs://hdfs-hadoop-88hh.c.creditvidya-152512.internal:8020/tmp/updated.csv
-rw-r--r--   3 hdfs hdfs    1133735 2017-06-07 10:05 hdfs://hdfs-hadoop-88hh.c.creditvidya-152512.internal:8020/tmp/updated.csv
hdfs@hdfs-hadoop-p0z7:~$

Re: Unable to read csv file in R

New Contributor
zeppelin@hdfs-hadoop-p0z7:~$ hdfs dfs -ls hdfs://hdfs-hadoop-88hh.c.creditvidya-152512.internal:8020/tmp/updated.csv
-rw-r--r--   3 hdfs hdfs    1133735 2017-06-07 10:05 hdfs://hdfs-hadoop-88hh.c.creditvidya-152512.internal:8020/tmp/updated.csv
zeppelin@hdfs-hadoop-p0z7:~$

Above output is from my zeppelin notebook server.yarn-log.txt