Support Questions

Aishug · ‎06-11-2019

Hello,

We are running spark application on yarn.

sometimes it's running fine with no delay, but sometimes we observed delay in spark processing job.

find the logs below

Failing this attempt.Diagnostics: [2019-06-10 15:38:53.090]Exception from container-launch.
Container id: container_1548676780185_0067_56_000001
Exit code: 15
[2019-06-10 15:38:53.091]Container exited with a non-zero exit code 15. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
0 (TID 58165, xptcxochapp104, executor 37): TaskKilled (Stage cancelled)
19/06/10 15:38:47 INFO storage.BlockManagerInfo: Added broadcast_59468_piece0 in memory on xptcxochapp104:46869 (size: 40.1 KB, free: 8.8 GB)
19/06/10 15:38:48 INFO cluster.YarnClusterScheduler: Removed TaskSet 93.0, whose tasks have all completed, from pool default
19/06/10 15:38:48 INFO scheduler.TaskSetManager: Starting task 12.0 in stage 94.0 (TID 58173, xptcxochapp104, executor 54, partition 12, NODE_LOCAL, 5019 bytes)
19/06/10 15:38:48 WARN scheduler.TaskSetManager: Lost task 61.0 in stage 93.0 (TID 58164, xptcxochapp104, executor 54): TaskKilled (Stage cancelled)
19/06/10 15:38:48 INFO cluster.YarnClusterScheduler: Removed TaskSet 93.0, whose tasks have all completed, from pool default
19/06/10 15:38:48 INFO storage.BlockManagerInfo: Added broadcast_59469_piece0 in memory on xptcxochapp104:38142 (size: 40.1 KB, free: 8.8 GB)
19/06/10 15:38:50 INFO scheduler.TaskSetManager: Starting task 13.0 in stage 94.0 (TID 58174, xptcxochapp104, executor 53, partition 13, NODE_LOCAL, 5019 bytes)
19/06/10 15:38:50 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 94.0 (TID 58166) in 5689 ms on xptcxochapp104 (executor 53) (1/728)
19/06/10 15:38:50 INFO storage.BlockManagerInfo: Added broadcast_59470_piece0 in memory on xptcxochapp104:38899 (size: 40.1 KB, free: 8.8 GB)
19/06/10 15:38:50 WARN util.ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException
java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:205)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:68)
19/06/10 15:38:50 ERROR util.Utils: Uncaught exception in thread pool-4-thread-1
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.spark.streaming.util.RecurringTimer.stop(RecurringTimer.scala:86)
at org.apache.spark.streaming.scheduler.JobGenerator.stop(JobGenerator.scala:137)
at org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:123)
at org.apache.spark.streaming.StreamingContext$$anonfun$stop$1.apply$mcV$sp(StreamingContext.scala:681)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1321)
at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:680)
at org.apache.spark.streaming.StreamingContext.org$apache$spark$streaming$StreamingContext$$stopOnShutdown(StreamingContext.scala:714)
at org.apache.spark.streaming.StreamingContext$$anonfun$start$1.apply$mcV$sp(StreamingContext.scala:599)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1952)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2019-06-10 15:38:53.092]Container exited with a non-zero exit code 15. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
0 (TID 58165, xptcxochapp104, executor 37): TaskKilled (Stage cancelled)
19/06/10 15:38:47 INFO storage.BlockManagerInfo: Added broadcast_59468_piece0 in memory on xptcxochapp104:46869 (size: 40.1 KB, free: 8.8 GB)
19/06/10 15:38:48 INFO cluster.YarnClusterScheduler: Removed TaskSet 93.0, whose tasks have all completed, from pool default
19/06/10 15:38:48 INFO scheduler.TaskSetManager: Starting task 12.0 in stage 94.0 (TID 58173, xptcxochapp104, executor 54, partition 12, NODE_LOCAL, 5019 bytes)
19/06/10 15:38:48 WARN scheduler.TaskSetManager: Lost task 61.0 in stage 93.0 (TID 58164, xptcxochapp104, executor 54): TaskKilled (Stage cancelled)
19/06/10 15:38:48 INFO cluster.YarnClusterScheduler: Removed TaskSet 93.0, whose tasks have all completed, from pool default
19/06/10 15:38:48 INFO storage.BlockManagerInfo: Added broadcast_59469_piece0 in memory on xptcxochapp104:38142 (size: 40.1 KB, free: 8.8 GB)
19/06/10 15:38:50 INFO scheduler.TaskSetManager: Starting task 13.0 in stage 94.0 (TID 58174, xptcxochapp104, executor 53, partition 13, NODE_LOCAL, 5019 bytes)
19/06/10 15:38:50 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 94.0 (TID 58166) in 5689 ms on xptcxochapp104 (executor 53) (1/728)
19/06/10 15:38:50 INFO storage.BlockManagerInfo: Added broadcast_59470_piece0 in memory on xptcxochapp104:38899 (size: 40.1 KB, free: 8.8 GB)
19/06/10 15:38:50 WARN util.ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException
java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:205)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:68)
19/06/10 15:38:50 ERROR util.Utils: Uncaught exception in thread pool-4-thread-1
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.spark.streaming.util.RecurringTimer.stop(RecurringTimer.scala:86)
at org.apache.spark.streaming.scheduler.JobGenerator.stop(JobGenerator.scala:137)
at org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:123)
at org.apache.spark.streaming.StreamingContext$$anonfun$stop$1.apply$mcV$sp(StreamingContext.scala:681)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1321)
at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:680)
at org.apache.spark.streaming.StreamingContext.org$apache$spark$streaming$StreamingContext$$stopOnShutdown(StreamingContext.scala:714)
at org.apache.spark.streaming.StreamingContext$$anonfun$start$1.apply$mcV$sp(StreamingContext.scala:599)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1952)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
For more detailed output, check the application tracking page: http://xptcxochapp102:8088/cluster/app/application_1548676780185_0067 Then click on links to logs of each attempt.

Spark Submit parameters:

spark-submit \
--master yarn \
--deploy-mode cluster \
--num-executors 24 \
--executor-cores 2 \
--driver-memory 10G \
--executor-memory 15G \
--conf "spark.cassandra.output.consistency.level=ANY" \
--conf "spark.cassandra.input.consistency.level=ONE" \
--conf "spark.yarn.executor.memoryOverhead=3G" \
--conf "spark.yarn.driver.memoryOverhead=3G" \
--conf "spark.scheduler.mode=FAIR" \
--conf "spark.cassandra.connection.host={IPs}" \
--conf "spark.streaming.fileStream.minRememberDuration=300s" \
--conf "spark.network.timeout=500s" \
--conf "spark.cassandra.connection.timeout_ms=600000" \
--conf "spark.executor.heartbeatInterval=20s" \
--conf "spark.yarn.maxAppAttempts=2" \
--conf "spark.yarn.am.attemptFailuresValidityInterval=1h" \
--conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=20" \
--conf "spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=20" \
--conf "spark.streaming.stopGracefullyOnShutdown=true"

Ahsan · ‎01-06-2020

Hi, I am running into the same issue. Did you find the cause?

Cloudera Community

Support Questions

AM Container for appattempt_id exited with exitCode: 15