Support Questions

Find answers, ask questions, and share your expertise

sparkR - Error in socketConnection(port = monitorPort)

avatar
Expert Contributor

I have centos 7.1.

On my multinode Hadoop cluster (2.3.4) I have , through Ambari, installed spark 1.5.2. I am trying to connect to sparkR from CLI and after I run sparkR I get the following error:

Error in value[[3L]](cond) : Failed to connect JVM In addition: Warning message: In socketConnection(host = hostname, port = port, server = FALSE, : localhost:9001 cannot be opened

The port (9001) is opened on the namenode (where Im running sparkR) Do you have any ideas what Im doing wrong? Ive seen this link: http://hortonworks.com/hadoop-tutorial/apache-spark-1-5-1-technical-preview-with-hdp-2-3/

and I followed also this link:

http://www.jason-french.com/blog/2013/03/11/installing-r-in-linux/

To install R on all datanodes. I appreicate your contribution.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Neeraj Sabharwal, @Artem Ervits

I made it work now!

Ive used Ubuntu 14.04 Trusty, installed manually Spark 1.4.1 and set up sparkR. Now, I dont know if the problem was in centos 7.2, but the installment of R was different than what Ive done earlier and from what it says here:

http://www.jason-french.com/blog/2013/03/11/installing-r-in-linux/

If you guys want, I can try the same on centos 7.2 and report. If you want, I can describe the process of preparing the environment for using sparkR. I will also try on other spark versions. We depend on R because of the research.

Let me know if there is interest.

View solution in original post

33 REPLIES 33

avatar
Expert Contributor

@Neeraj Sabharwal

hmmm..

So this is the part where the show ends for me:

Launching java with spark-submit command /usr/hdp/2.3.4.0-3485/spark/bin/spark-submit "sparkr-shell" /tmp/Rtmp69Q264/backend_portae4c24444ac20

So now I checked if spark-submit works by running the following example:

cd $SPARK_HOME

sudo -u spark ./bin/spark-submit --class org.apache.spark.examples.SparkPi--master yarn-client --num-executors 3--driver-memory 512m--executor-memory 512m--executor-cores 1 lib/spark-examples*.jar 10

And the result is

Lots of these:

INFO Client: Application report for application_1455610402042_0021 (state: ACCEPTED)

Then:

SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.

In the file, the whole error can be found

spark-submit-error.txt

Am I missing something in Spark setup?

avatar
Master Mentor

avatar
Expert Contributor

@Neeraj Sabharwal

I ran the same spark-submit command with ONE difference:

--master was yarn-cluster

I came to the status FINISHED:

INFO Client: Application report for application_1455610402042_0022 (state: FINISHED)

and it ended up with this:

Exception in thread "main" org.apache.spark.SparkException: Application application_1455610402042_0022 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:974) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1020) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497)

...

16/02/16 18:15:58 INFO ShutdownHookManager: Shutdown hook called 16/02/16 18:15:58 INFO ShutdownHookManager: Deleting directory /tmp/spark-3ba9f87c-18c2-4d0d-b360-49fa10408631

avatar
Master Mentor

@marko yarn log -applicationid application_1455610402042_0021

Output of the above command? I hope its not failing because of memory

avatar
Expert Contributor

@Neeraj Sabharwal

Ive ran the command you recommended and Im getting an error saying

Error: Could not find or load main class log

I have setup another cluster - namenode + 3 datanodes using ambari. Ive followed this link:

http://hortonworks.com/hadoop-tutorial/apache-spark-1-6-technical-preview-with-hdp-2-3/

I installed R on all the nodes.

All examples worked until I came to the sparkR:

Launching java with spark-submit command /usr/hdp/2.3.4.0-3485/spark/bin/spark-submit "sparkr-shell" /tmp/Rtmphs2DlM/backend_port3b7b4c9a912b 16/02/17 09:37:36 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.

Error in socketConnection(port = monitorPort) :

cannot open the connection

In addition: Warning message:

In socketConnection(port = monitorPort) : localhost:40949 cannot be opened

>

Ive opened all the ports (1-65535) and port 0 for the namenode and the datanodes.

@Artem Ervits - do you have any idea what I am missing?

avatar
Expert Contributor

Im looking at the code:

https://github.com/amplab-extras/SparkR-pkg/blob/master/pkg/src/src/main/scala/edu/berkeley/cs/ampla...

env variable EXISTING_SPARKR_BACKEND_PORT can be defined through bashrc,

the try-catch that returns my error is the following:

tryCatch({    
	connectBackend("localhost", backendPort)  
	error = function(err) {    
		stop("Failed to connect JVM\n")

Isnt it interesting that localhost is written in it this way? Or is there an explanation for it?

avatar
Expert Contributor

Now I installed Spark 1.6.0 just to test if Ambari makes some changes during spark installation: Same result:

Error in socketConnection(port = monitorPort) : cannot open the connection In addition: Warning message: In socketConnection(port = monitorPort) : localhost:51604 cannot be opened

Could it be YARN?

avatar
Master Mentor

@marko please check whether firewall is blocking.

avatar
Expert Contributor

@Artem Ervits

I ran:

sudo systemctl status firewalld

And the result is this:

firewalld.service Loaded: not-found (Reason: No such file or directory) Active: inactive (dead)

avatar
Master Mentor

Is this centos7? If not try below, also make sure to do that on all nodes

sudo service iptables stop