Support Questions

Find answers, ask questions, and share your expertise

failing to connect to spark driver when submitting job to spark cluster

avatar
Explorer

When I submit a spark job to the cluster it failed and gives me the following error in the log file:

Caused by: java.io.IOException: Failed to connect to /0.0.0.0:35994 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:232) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182) at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197) at org.apache.spark.rpc.netty.Outbox$anon$1.call(Outbox.scala:194) at org.apache.spark.rpc.netty.Outbox$anon$1.call(Outbox.scala:190) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

Which I guess means it failed to connect to the driver. I tried to increase "spark.yarn.executor.memoryOverhead" parameter but it doesn't work.

This is the submit command I use:

/bin/spark-submit --class example.Hello --jars ... --master yarn --deploy-mode cluster --supervise --conf spark.yarn.driver.memoryOverhead=1024 ...(jar file path)

I am using HDP-2.6.1.0 and spark 2.1.1

10 REPLIES 10

avatar
Explorer

Hi Tariq, can you confirm that the cluster is running. Also what happens when you attempt to run in local mode? (by changing the --master and --deploy-mode parameters).

avatar
Explorer

Thank you for responding Mark. Ambari does not show any alerts regarding spark. Is there any other way to make sure the cluster is running. Also I run it in local mode like this:

/bin/spark-submit --class example.Hello --jars ... --master local --supervise --conf spark.yarn.driver.memoryOverhead=1024

and it ran without any problems.

avatar
Explorer

In Ambari dashboard, Spark or Spark2 should show on the list of services installed on the left. Click on these links to see status of servers. If there are no links to Spark or Spark2, then it may not be installed. Click on "add services" link on the bottom left to see if Spark and/or Spark2 is selectable for install.

avatar
Explorer

Yes. I checked that in Ambari and Spark is installed. Here is a screen shot:

16574-screenshot.jpg

avatar
Contributor

Hi Tariq, Please check if any hardware firewall / software firewall(iptables) is present in b/w client node and worker node . you can test the connectivity by below.

On server end(where driver runs):-

nc -l 35994

On client end(where worker runs):-

nc -vz <server-ip> 35994

avatar
Explorer

I am not able to do that since the driver port change randomly on each job submit. Can I fix the port value?

Also, since the client is connecting to the driver on the local API I don't think a firewall is the problem. right?

avatar
Contributor

Set SPARK_MASTER_PORT = 35994 in spark-env.sh , and restart the spark . If you are not able to pass the port test with sample port , then it is the filrewall issue. what is the o/p of the test.

On server end(where driver runs):-
nc -l 35994
On client end(where worker runs):-
nc -vz <server-ip> 35994

avatar
Explorer

thanks for your reply Kalai.

In my submit command I am using spark in yarn mode (--master yarn) not stand alone mode so I do not think it will use this configuration.

Also as far as I understand this sets the master node in stand alone mode and has nothing to do with the driver port.

Anyway to confirm that I tried to make the changes you metioned and it still ran on random ports.

avatar

@tariq abughofa

Can you check if the firewall is blocking the ports?