Support Questions

ArunShell · ‎09-11-2014

Hi,

I am submitting a Spark Streaming job using spark-submit.

"spark-submit --class "test.Main" --master yarn-client testjob.jar"

But I am facing the below errors. Please assist to resolve.

14/09/11 05:55:45 ERROR YarnClientClusterScheduler: Lost executor 4 on host1.com: remote Akka client disassociated



14/09/11 05:56:06 ERROR JobScheduler: Error running job streaming job 1410429330000 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1.0:0 failed 4 times, most recent failure: TID 3 on host host2.com failed for unknown reason
Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)




Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1.0:0 failed 4 times, most recent failure: TID 3 on host host3.com failed for unknown reason
Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

srowen · ‎09-12-2014

Have a look at:

https://spark.apache.org/docs/latest/configuration.html#networking

I think you are interested in fixing the driver and executor ports to a fixed value, rather than let them be chosen randomly.

Same with the UI ports, if you're interested in those.

View solution in original post

srowen · ‎09-11-2014

It's nothing to do with Akka per se. This says your jobs are failing. You would have to look at the logs on the workers to understand why.

ArunShell · ‎09-11-2014

Hi,

The worker logs show the following connection erros - Any idea how to resolve?

AssociationError [akka.tcp://sparkWorker@host1:7078] -> [akka.tcp://sparkExecutor@worker1:33912]: 
Error [Association failed with [akka.tcp://sparkExecutor@worker1:33912]] 
[akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@worker1:33912]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: worker1/10.11.11.11:33912]

srowen · ‎09-11-2014

On its face it means what it says -- the master is unable to talk to the worker. I would check your firewall rules and make sure these machines can talk to each other, and on these ports. Spark picks ephemeral ports so you may have to open ranges.

ArunShell · ‎09-11-2014

Thanks.

Please clarify the below -

What is the port range that I need to ask the admin team to open on each worker node?

And what are these ports used for, Spark Workers already use the port 7078 right? Are these random ports opened for each spark job ?

srowen · ‎09-12-2014

Have a look at:

https://spark.apache.org/docs/latest/configuration.html#networking

I think you are interested in fixing the driver and executor ports to a fixed value, rather than let them be chosen randomly.

Same with the UI ports, if you're interested in those.

ArunShell · ‎09-12-2014

This makes sense - thanks!

ArunShell · ‎09-12-2014

Hi - Does it make a difference if I use a "--master yarn-client" or " --master yarn-cluster" for this error in "spark-submit" since yarn-client uses a local driver?

srowen · ‎09-12-2014

It will make a difference insofar as the driver program will run either out on the cluster (yarn-cluster) or locally (yarn-client). The same issue remains -- the processes need to talk to each other on certain ports. But it affects where the driver is and that affects what machine's ports need to be open. For example, if your ports are all open within your cluster, I expect that yarn-cluster works directly.

Cloudera Community

Support Questions

Akka Error while running Spark Jobs