Reply
Explorer
Posts: 62
Registered: ‎01-22-2014
Accepted Solution

Akka Error while running Spark Jobs

 

Hi,

 

I am submitting a Spark Streaming job using spark-submit.

 

"spark-submit  --class "test.Main" --master yarn-client testjob.jar"

 

But I am facing the below errors. Please assist to resolve.

 

14/09/11 05:55:45 ERROR YarnClientClusterScheduler: Lost executor 4 on host1.com: remote Akka client disassociated



14/09/11 05:56:06 ERROR JobScheduler: Error running job streaming job 1410429330000 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1.0:0 failed 4 times, most recent failure: TID 3 on host host2.com failed for unknown reason
Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)




Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1.0:0 failed 4 times, most recent failure: TID 3 on host host3.com failed for unknown reason
Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

 

Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Akka Error while running Spark Jobs

It's nothing to do with Akka per se. This says your jobs are failing. You would have to look at the logs on the workers to understand why.

Explorer
Posts: 62
Registered: ‎01-22-2014

Re: Akka Error while running Spark Jobs

Hi,

 

The worker logs show the following connection erros - Any idea how to resolve?

 

AssociationError [akka.tcp://sparkWorker@host1:7078] -> [akka.tcp://sparkExecutor@worker1:33912]: 
Error [Association failed with [akka.tcp://sparkExecutor@worker1:33912]] 
[akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@worker1:33912]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: worker1/10.11.11.11:33912]

 

Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Akka Error while running Spark Jobs

On its face it means what it says -- the master is unable to talk to the worker. I would check your firewall rules and make sure these machines can talk to each other, and on these ports. Spark picks ephemeral ports so you may have to open ranges.

Explorer
Posts: 62
Registered: ‎01-22-2014

Re: Akka Error while running Spark Jobs

Thanks.

 

Please clarify the below - 

 

What is the port range that I need to ask the admin team to open on each worker node?

 

And what are these ports used for, Spark Workers already use the port 7078 right? Are these random ports opened for each spark job ?

Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Akka Error while running Spark Jobs

Have a look at:

 

https://spark.apache.org/docs/latest/configuration.html#networking

 

I think you are interested in fixing the driver and executor ports to a fixed value, rather than let them be chosen randomly.

Same with the UI ports, if you're interested in those.

Explorer
Posts: 62
Registered: ‎01-22-2014

Re: Akka Error while running Spark Jobs

 

This makes sense - thanks!

Explorer
Posts: 62
Registered: ‎01-22-2014

Re: Akka Error while running Spark Jobs

 

Hi - Does it make a difference if I use a "--master yarn-client"  or  " --master  yarn-cluster"  for  this error in "spark-submit" since yarn-client uses a local driver?

Highlighted
Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Akka Error while running Spark Jobs

It will make a difference insofar as the driver program will run either out on the cluster (yarn-cluster) or locally (yarn-client). The same issue remains -- the processes need to talk to each other on certain ports. But it affects where the driver is and that affects what machine's ports need to be open. For example, if your ports are all open within your cluster, I expect that yarn-cluster works directly.

Announcements