Support Questions

Find answers, ask questions, and share your expertise

INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)

avatar

lot of jobs are failing with the error unable to connect to driver

41427-capture.png

5 REPLIES 5

avatar
Contributor
Try spark-submit --master <master-ip>:<spark-port> to submit the job.

avatar

Thanks Thangarajan but giving ip and port will make the job specific to only that particular

so again if those ports are not available it could be a problem is there any other work around

thanks for your rply

avatar
Master Mentor

@deepak rathod

Is this happening with all the jobs? For example even with some long running jobs?

Sometimes it can happen when the Spark job finishes fine but too early, But the executors are still trying to contact driver, Hence ultimately yarn declares the job as failed since executors could not connect.

.

It can also happen if there is a Firewall (Network issue) if some ports are blocked. So you might want to check if the ports are accessible properly or not? Mostly the port are chosen at random in spark, but you may try setting spark.driver.port to see if it is accessible remotely and helps. For other ports please refer to:


http://spark.apache.org/docs/latest/security.html#configuring-ports-for-network-security

http://spark.apache.org/docs/latest/configuration.html#networking

.

avatar

yeah it is happenning with all the jobs and after some time they are picking some other port and start working fine

what is the thing those jobs trying to find in the port are they trying to find the node manager

for the successfull jobs i went to the port and did ps -wwf port number where i can find node manager running there

avatar
New Contributor

@deepak rathod

I have encountered the same issue.

Did you able to fix it?

If yes, can you please share the solution here.

Thanks in advance.