question Re: Akka Error while running Spark Jobs in Archives of Support Questions (Read Only)

Akka Error while running Spark Jobs

ArunShell — Fri, 16 Sep 2022 09:07:21 GMT

Hi,

I am submitting a Spark Streaming job using spark-submit.

"spark-submit --class "test.Main" --master yarn-client testjob.jar"

But I am facing the below errors. Please assist to resolve.

14/09/11 05:55:45 ERROR YarnClientClusterScheduler: Lost executor 4 on host1.com: remote Akka client disassociated



14/09/11 05:56:06 ERROR JobScheduler: Error running job streaming job 1410429330000 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1.0:0 failed 4 times, most recent failure: TID 3 on host host2.com failed for unknown reason
Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)




Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1.0:0 failed 4 times, most recent failure: TID 3 on host host3.com failed for unknown reason
Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Re: Akka Error while running Spark Jobs

srowen — Thu, 11 Sep 2014 10:13:53 GMT

It's nothing to do with Akka per se. This says your jobs are failing. You would have to look at the logs on the workers to understand why.

Re: Akka Error while running Spark Jobs

ArunShell — Thu, 11 Sep 2014 15:22:56 GMT

Hi,

The worker logs show the following connection erros - Any idea how to resolve?

AssociationError [akka.tcp://sparkWorker@host1:7078] -> [akka.tcp://sparkExecutor@worker1:33912]: 
Error [Association failed with [akka.tcp://sparkExecutor@worker1:33912]] 
[akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@worker1:33912]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: worker1/10.11.11.11:33912]

Re: Akka Error while running Spark Jobs

srowen — Thu, 11 Sep 2014 15:54:34 GMT

On its face it means what it says -- the master is unable to talk to the worker. I would check your firewall rules and make sure these machines can talk to each other, and on these ports. Spark picks ephemeral ports so you may have to open ranges.

Re: Akka Error while running Spark Jobs

ArunShell — Fri, 12 Sep 2014 06:38:04 GMT

Thanks.

Please clarify the below -

What is the port range that I need to ask the admin team to open on each worker node?

And what are these ports used for, Spark Workers already use the port 7078 right? Are these random ports opened for each spark job ?

Re: Akka Error while running Spark Jobs

srowen — Fri, 12 Sep 2014 07:45:28 GMT

Have a look at:

https://spark.apache.org/docs/latest/configuration.html#networking

I think you are interested in fixing the driver and executor ports to a fixed value, rather than let them be chosen randomly.

Same with the UI ports, if you're interested in those.

Re: Akka Error while running Spark Jobs

ArunShell — Fri, 12 Sep 2014 07:54:14 GMT

This makes sense - thanks!

Re: Akka Error while running Spark Jobs

ArunShell — Fri, 12 Sep 2014 13:35:17 GMT

Hi - Does it make a difference if I use a "--master yarn-client" or " --master yarn-cluster" for this error in "spark-submit" since yarn-client uses a local driver?

Re: Akka Error while running Spark Jobs

srowen — Fri, 12 Sep 2014 13:49:01 GMT

It will make a difference insofar as the driver program will run either out on the cluster (yarn-cluster) or locally (yarn-client). The same issue remains -- the processes need to talk to each other on certain ports. But it affects where the driver is and that affects what machine's ports need to be open. For example, if your ports are all open within your cluster, I expect that yarn-cluster works directly.