Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)

Highlighted

INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)

New Contributor

lot of jobs are failing with the error unable to connect to driver

41427-capture.png

5 REPLIES 5

Re: INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)

New Contributor
Try spark-submit --master <master-ip>:<spark-port> to submit the job.

Re: INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)

New Contributor

Thanks Thangarajan but giving ip and port will make the job specific to only that particular

so again if those ports are not available it could be a problem is there any other work around

thanks for your rply

Re: INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)

Super Mentor

@deepak rathod

Is this happening with all the jobs? For example even with some long running jobs?

Sometimes it can happen when the Spark job finishes fine but too early, But the executors are still trying to contact driver, Hence ultimately yarn declares the job as failed since executors could not connect.

.

It can also happen if there is a Firewall (Network issue) if some ports are blocked. So you might want to check if the ports are accessible properly or not? Mostly the port are chosen at random in spark, but you may try setting spark.driver.port to see if it is accessible remotely and helps. For other ports please refer to:


http://spark.apache.org/docs/latest/security.html#configuring-ports-for-network-security

http://spark.apache.org/docs/latest/configuration.html#networking

.

Re: INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)

New Contributor

yeah it is happenning with all the jobs and after some time they are picking some other port and start working fine

what is the thing those jobs trying to find in the port are they trying to find the node manager

for the successfull jobs i went to the port and did ps -wwf port number where i can find node manager running there

Re: INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)

New Contributor

@deepak rathod

I have encountered the same issue.

Did you able to fix it?

If yes, can you please share the solution here.

Thanks in advance.