Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hiccup with spark-submit in yarn-client mode

Hiccup with spark-submit in yarn-client mode

New Contributor

Hi forum,

 

While playing around with the SparkPi example on our CDH 5.4.4 cluster, I observed some strange (?) behaviour when submitting jobs from my laptop in yarn-client mode: after the first 2 tasks have been started, there seems to be a hiccup of about 30 seconds (see log below). In addition, the workload does not seem to be equally divided over the 2 nodes: 9 out of 10 jobs is processed on the same node. The job runs successfully, but takes significantly longer to complete compared to other ways of submitting.

 

This is the command I issued:

 

spark-submit --master yarn-client --conf spark.yarn.jar=hdfs:///spark-assembly-1.6.1-hadoop2.6.0.jar --class org.apache.spark.examples.SparkPi spark-examples-1.6.1-hadoop2.6.0.jar 10

 

This delay is not present when submitting from my laptop in yarn-cluster mode and the logs show that in this case the tasks are equally divided over the nodes. Also, when submitting in yarn-client mode from the cluster itself, everything seems to be fine.

 

I'm wondering whether this is expected behaviour or something is wrong with our configuration?

 

This is part of the log; notice the delay between the 3rd and 4th line:

 

16/05/19 14:27:34 INFO cluster.YarnScheduler: Adding task set 0.0 with 10 tasks
16/05/19 14:27:34 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, p-web-hdp-d12, partition 0,PROCESS_LOCAL, 2156 bytes)
16/05/19 14:27:34 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, p-web-hdp-d14, partition 1,PROCESS_LOCAL, 2156 bytes)
16/05/19 14:28:00 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on p-web-hdp-d12:44416 (size: 1218.0 B, free: 511.5 MB)
16/05/19 14:28:00 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, p-web-hdp-d12, partition 2,PROCESS_LOCAL, 2156 bytes)
16/05/19 14:28:00 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 26443 ms on p-web-hdp-d12 (1/10)
16/05/19 14:28:01 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, p-web-hdp-d12, partition 3,PROCESS_LOCAL, 2156 bytes)
16/05/19 14:28:01 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 40 ms on p-web-hdp-d12 (2/10)
16/05/19 14:28:01 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, p-web-hdp-d12, partition 4,PROCESS_LOCAL, 2156 bytes)
16/05/19 14:28:01 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 39 ms on p-web-hdp-d12 (3/10)
16/05/19 14:28:01 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, p-web-hdp-d12, partition 5,PROCESS_LOCAL, 2156 bytes)
16/05/19 14:28:01 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 29 ms on p-web-hdp-d12 (4/10)
16/05/19 14:28:01 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, p-web-hdp-d12, partition 6,PROCESS_LOCAL, 2156 bytes)
16/05/19 14:28:01 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 30 ms on p-web-hdp-d12 (5/10)
16/05/19 14:28:01 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, p-web-hdp-d12, partition 7,PROCESS_LOCAL, 2156 bytes)
16/05/19 14:28:01 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 31 ms on p-web-hdp-d12 (6/10)
16/05/19 14:28:01 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on p-web-hdp-d14:47572 (size: 1218.0 B, free: 511.5 MB)
16/05/19 14:28:01 INFO scheduler.TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, p-web-hdp-d12, partition 8,PROCESS_LOCAL, 2156 bytes)
16/05/19 14:28:01 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 30 ms on p-web-hdp-d12 (7/10)
16/05/19 14:28:01 INFO scheduler.TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, p-web-hdp-d12, partition 9,PROCESS_LOCAL, 2156 bytes)
16/05/19 14:28:01 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 29 ms on p-web-hdp-d12 (8/10)
16/05/19 14:28:01 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 29 ms on p-web-hdp-d12 (9/10)
16/05/19 14:28:01 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 26796 ms on p-web-hdp-d14 (10/10)
16/05/19 14:28:01 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:36) finished in 26.835 s
16/05/19 14:28:01 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/05/19 14:28:01 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:36, took 27.046519 s
Pi is roughly 3.140872

2 REPLIES 2
Highlighted

Re: Hiccup with spark-submit in yarn-client mode

Rising Star

Don't know how spark works with YARN, but you can turn on DEBUG level log to show more information that will hopefully explains the 30 seconds.

Re: Hiccup with spark-submit in yarn-client mode

New Contributor

Thanks for your help. I see a lot of "IPC client sending #int / got value #int" iterations. Only after task 0.0 has finished, which takes about 30 seconds, there seems to be real progress again.

 

16/05/20 12:09:00 INFO YarnScheduler: Adding task set 0.0 with 10 tasks
16/05/20 12:09:00 DEBUG TaskSetManager: Epoch for TaskSet 0.0: 0
16/05/20 12:09:00 DEBUG TaskSetManager: Valid locality levels for TaskSet 0.0: NO_PREF, ANY
16/05/20 12:09:00 DEBUG YarnScheduler: parentName: , name: TaskSet_0, runningTasks: 0
16/05/20 12:09:00 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, p-web-hdp-d12, partition 0,PROCESS_LOCAL, 2156 bytes)
16/05/20 12:09:00 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, p-web-hdp-d14, partition 1,PROCESS_LOCAL, 2156 bytes)
16/05/20 12:09:00 DEBUG YarnScheduler: parentName: , name: TaskSet_0, runningTasks: 2
16/05/20 12:09:00 DEBUG Client: IPC Client (824900551) connection to p-web-hdp-d11/xx.xxx.xxx.xxx:8032 from user.name sending #27
16/05/20 12:09:00 DEBUG Client: IPC Client (824900551) connection to p-web-hdp-d11/xx.xxx.xxx.xxx:8032 from user.name got value #27
16/05/20 12:09:00 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 38ms
16/05/20 12:09:01 DEBUG YarnScheduler: parentName: , name: TaskSet_0, runningTasks: 2
16/05/20 12:09:01 DEBUG Client: IPC Client (824900551) connection to p-web-hdp-d11/xx.xxx.xxx.xxx:8032 from user.name sending #28
16/05/20 12:09:02 DEBUG Client: IPC Client (824900551) connection to p-web-hdp-d11/xx.xxx.xxx.xxx:8032 from user.name got value #28
16/05/20 12:09:02 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 55ms
16/05/20 12:09:02 DEBUG YarnScheduler: parentName: , name: TaskSet_0, runningTasks: 2
16/05/20 12:09:03 DEBUG Client: IPC Client (824900551) connection to p-web-hdp-d11/xx.xxx.xxx.xxx:8032 from user.name sending #29
16/05/20 12:09:03 DEBUG Client: IPC Client (824900551) connection to p-web-hdp-d11/xx.xxx.xxx.xxx:8032 from user.name got value #29
16/05/20 12:09:03 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 40ms
16/05/20 12:09:03 DEBUG Client: IPC Client (824900551) connection to p-web-hdp-d11/xx.xxx.xxx.xxx:8020 from user.name: closed
16/05/20 12:09:03 DEBUG Client: IPC Client (824900551) connection to p-web-hdp-d11/xx.xxx.xxx.xxx:8020 from user.name: stopped, remaining connections 1
16/05/20 12:09:03 DEBUG YarnScheduler: parentName: , name: TaskSet_0, runningTasks: 2
16/05/20 12:09:04 DEBUG Client: IPC Client (824900551) connection to p-web-hdp-d11/xx.xxx.xxx.xxx:8032 from user.name sending #30
16/05/20 12:09:04 DEBUG Client: IPC Client (824900551) connection to p-web-hdp-d11/xx.xxx.xxx.xxx:8032 from user.name got value #30
16/05/20 12:09:04 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 43ms
16/05/20 12:09:04 DEBUG YarnScheduler: parentName: , name: TaskSet_0, runningTasks: 2
16/05/20 12:09:05 DEBUG Client: IPC Client (824900551) connection to p-web-hdp-d11/xx.xxx.xxx.xxx:8032 from user.name sending #31
16/05/20 12:09:05 DEBUG Client: IPC Client (824900551) connection to p-web-hdp-d11/xx.xxx.xxx.xxx:8032 from user.name got value #31
16/05/20 12:09:05 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 48ms
16/05/20 12:09:05 DEBUG YarnScheduler: parentName: , name: TaskSet_0, runningTasks: 2
16/05/20 12:09:06 DEBUG Client: IPC Client (824900551) connection to p-web-hdp-d11/xx.xxx.xxx.xxx:8032 from user.name sending #32
16/05/20 12:09:06 DEBUG Client: IPC Client (824900551) connection to p-web-hdp-d11/xx.xxx.xxx.xxx:8032 from user.name got value #32
16/05/20 12:09:06 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 41ms
16/05/20 12:09:06 DEBUG YarnScheduler: parentName: , name: TaskSet_0, runningTasks: 2
16/05/20 12:09:07 DEBUG Client: IPC Client (824900551) connection to p-web-hdp-d11/xx.xxx.xxx.xxx:8032 from user.name sending #33
16/05/20 12:09:07 DEBUG Client: IPC Client (824900551) connection to p-web-hdp-d11/xx.xxx.xxx.xxx:8032 from user.name got value #33
16/05/20 12:09:07 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 38ms
16/05/20 12:09:07 DEBUG YarnScheduler: parentName: , name: TaskSet_0, runningTasks: 2
16/05/20 12:09:08 DEBUG Client: IPC Client (824900551) connection to p-web-hdp-d11/xx.xxx.xxx.xxx:8032 from user.name sending #34
16/05/20 12:09:08 DEBUG Client: IPC Client (824900551) connection to p-web-hdp-d11/xx.xxx.xxx.xxx:8032 from user.name got value #34
16/05/20 12:09:08 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 45ms

etc, etc

Don't have an account?
Coming from Hortonworks? Activate your account here