Created on 06-11-2019 12:06 AM - edited 09-16-2022 07:26 AM
Hello,
We are running spark application on yarn.
sometimes it's running fine with no delay, but sometimes we observed delay in spark processing job.
find the logs below
Failing this attempt.Diagnostics: [2019-06-10 15:38:53.090]Exception from container-launch.
Container id: container_1548676780185_0067_56_000001
Exit code: 15
[2019-06-10 15:38:53.091]Container exited with a non-zero exit code 15. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
0 (TID 58165, xptcxochapp104, executor 37): TaskKilled (Stage cancelled)
19/06/10 15:38:47 INFO storage.BlockManagerInfo: Added broadcast_59468_piece0 in memory on xptcxochapp104:46869 (size: 40.1 KB, free: 8.8 GB)
19/06/10 15:38:48 INFO cluster.YarnClusterScheduler: Removed TaskSet 93.0, whose tasks have all completed, from pool default
19/06/10 15:38:48 INFO scheduler.TaskSetManager: Starting task 12.0 in stage 94.0 (TID 58173, xptcxochapp104, executor 54, partition 12, NODE_LOCAL, 5019 bytes)
19/06/10 15:38:48 WARN scheduler.TaskSetManager: Lost task 61.0 in stage 93.0 (TID 58164, xptcxochapp104, executor 54): TaskKilled (Stage cancelled)
19/06/10 15:38:48 INFO cluster.YarnClusterScheduler: Removed TaskSet 93.0, whose tasks have all completed, from pool default
19/06/10 15:38:48 INFO storage.BlockManagerInfo: Added broadcast_59469_piece0 in memory on xptcxochapp104:38142 (size: 40.1 KB, free: 8.8 GB)
19/06/10 15:38:50 INFO scheduler.TaskSetManager: Starting task 13.0 in stage 94.0 (TID 58174, xptcxochapp104, executor 53, partition 13, NODE_LOCAL, 5019 bytes)
19/06/10 15:38:50 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 94.0 (TID 58166) in 5689 ms on xptcxochapp104 (executor 53) (1/728)
19/06/10 15:38:50 INFO storage.BlockManagerInfo: Added broadcast_59470_piece0 in memory on xptcxochapp104:38899 (size: 40.1 KB, free: 8.8 GB)
19/06/10 15:38:50 WARN util.ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException
java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:205)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:68)
19/06/10 15:38:50 ERROR util.Utils: Uncaught exception in thread pool-4-thread-1
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.spark.streaming.util.RecurringTimer.stop(RecurringTimer.scala:86)
at org.apache.spark.streaming.scheduler.JobGenerator.stop(JobGenerator.scala:137)
at org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:123)
at org.apache.spark.streaming.StreamingContext$$anonfun$stop$1.apply$mcV$sp(StreamingContext.scala:681)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1321)
at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:680)
at org.apache.spark.streaming.StreamingContext.org$apache$spark$streaming$StreamingContext$$stopOnShutdown(StreamingContext.scala:714)
at org.apache.spark.streaming.StreamingContext$$anonfun$start$1.apply$mcV$sp(StreamingContext.scala:599)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1952)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2019-06-10 15:38:53.092]Container exited with a non-zero exit code 15. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
0 (TID 58165, xptcxochapp104, executor 37): TaskKilled (Stage cancelled)
19/06/10 15:38:47 INFO storage.BlockManagerInfo: Added broadcast_59468_piece0 in memory on xptcxochapp104:46869 (size: 40.1 KB, free: 8.8 GB)
19/06/10 15:38:48 INFO cluster.YarnClusterScheduler: Removed TaskSet 93.0, whose tasks have all completed, from pool default
19/06/10 15:38:48 INFO scheduler.TaskSetManager: Starting task 12.0 in stage 94.0 (TID 58173, xptcxochapp104, executor 54, partition 12, NODE_LOCAL, 5019 bytes)
19/06/10 15:38:48 WARN scheduler.TaskSetManager: Lost task 61.0 in stage 93.0 (TID 58164, xptcxochapp104, executor 54): TaskKilled (Stage cancelled)
19/06/10 15:38:48 INFO cluster.YarnClusterScheduler: Removed TaskSet 93.0, whose tasks have all completed, from pool default
19/06/10 15:38:48 INFO storage.BlockManagerInfo: Added broadcast_59469_piece0 in memory on xptcxochapp104:38142 (size: 40.1 KB, free: 8.8 GB)
19/06/10 15:38:50 INFO scheduler.TaskSetManager: Starting task 13.0 in stage 94.0 (TID 58174, xptcxochapp104, executor 53, partition 13, NODE_LOCAL, 5019 bytes)
19/06/10 15:38:50 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 94.0 (TID 58166) in 5689 ms on xptcxochapp104 (executor 53) (1/728)
19/06/10 15:38:50 INFO storage.BlockManagerInfo: Added broadcast_59470_piece0 in memory on xptcxochapp104:38899 (size: 40.1 KB, free: 8.8 GB)
19/06/10 15:38:50 WARN util.ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException
java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:205)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:68)
19/06/10 15:38:50 ERROR util.Utils: Uncaught exception in thread pool-4-thread-1
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.spark.streaming.util.RecurringTimer.stop(RecurringTimer.scala:86)
at org.apache.spark.streaming.scheduler.JobGenerator.stop(JobGenerator.scala:137)
at org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:123)
at org.apache.spark.streaming.StreamingContext$$anonfun$stop$1.apply$mcV$sp(StreamingContext.scala:681)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1321)
at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:680)
at org.apache.spark.streaming.StreamingContext.org$apache$spark$streaming$StreamingContext$$stopOnShutdown(StreamingContext.scala:714)
at org.apache.spark.streaming.StreamingContext$$anonfun$start$1.apply$mcV$sp(StreamingContext.scala:599)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1952)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
For more detailed output, check the application tracking page: http://xptcxochapp102:8088/cluster/app/application_1548676780185_0067 Then click on links to logs of each attempt.
Spark Submit parameters:
spark-submit \
--master yarn \
--deploy-mode cluster \
--num-executors 24 \
--executor-cores 2 \
--driver-memory 10G \
--executor-memory 15G \
--conf "spark.cassandra.output.consistency.level=ANY" \
--conf "spark.cassandra.input.consistency.level=ONE" \
--conf "spark.yarn.executor.memoryOverhead=3G" \
--conf "spark.yarn.driver.memoryOverhead=3G" \
--conf "spark.scheduler.mode=FAIR" \
--conf "spark.cassandra.connection.host={IPs}" \
--conf "spark.streaming.fileStream.minRememberDuration=300s" \
--conf "spark.network.timeout=500s" \
--conf "spark.cassandra.connection.timeout_ms=600000" \
--conf "spark.executor.heartbeatInterval=20s" \
--conf "spark.yarn.maxAppAttempts=2" \
--conf "spark.yarn.am.attemptFailuresValidityInterval=1h" \
--conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=20" \
--conf "spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=20" \
--conf "spark.streaming.stopGracefullyOnShutdown=true"
Created 01-06-2020 08:35 PM
Hi, I am running into the same issue. Did you find the cause?