Member since
06-11-2019
1
Post
0
Kudos Received
0
Solutions
06-11-2019
12:06 AM
Hello, We are running spark application on yarn. sometimes it's running fine with no delay, but sometimes we observed delay in spark processing job. find the logs below Failing this attempt.Diagnostics: [2019-06-10 15:38:53.090]Exception from container-launch. Container id: container_1548676780185_0067_56_000001 Exit code: 15 [2019-06-10 15:38:53.091]Container exited with a non-zero exit code 15. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : 0 (TID 58165, xptcxochapp104, executor 37): TaskKilled (Stage cancelled) 19/06/10 15:38:47 INFO storage.BlockManagerInfo: Added broadcast_59468_piece0 in memory on xptcxochapp104:46869 (size: 40.1 KB, free: 8.8 GB) 19/06/10 15:38:48 INFO cluster.YarnClusterScheduler: Removed TaskSet 93.0, whose tasks have all completed, from pool default 19/06/10 15:38:48 INFO scheduler.TaskSetManager: Starting task 12.0 in stage 94.0 (TID 58173, xptcxochapp104, executor 54, partition 12, NODE_LOCAL, 5019 bytes) 19/06/10 15:38:48 WARN scheduler.TaskSetManager: Lost task 61.0 in stage 93.0 (TID 58164, xptcxochapp104, executor 54): TaskKilled (Stage cancelled) 19/06/10 15:38:48 INFO cluster.YarnClusterScheduler: Removed TaskSet 93.0, whose tasks have all completed, from pool default 19/06/10 15:38:48 INFO storage.BlockManagerInfo: Added broadcast_59469_piece0 in memory on xptcxochapp104:38142 (size: 40.1 KB, free: 8.8 GB) 19/06/10 15:38:50 INFO scheduler.TaskSetManager: Starting task 13.0 in stage 94.0 (TID 58174, xptcxochapp104, executor 53, partition 13, NODE_LOCAL, 5019 bytes) 19/06/10 15:38:50 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 94.0 (TID 58166) in 5689 ms on xptcxochapp104 (executor 53) (1/728) 19/06/10 15:38:50 INFO storage.BlockManagerInfo: Added broadcast_59470_piece0 in memory on xptcxochapp104:38899 (size: 40.1 KB, free: 8.8 GB) 19/06/10 15:38:50 WARN util.ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:68) 19/06/10 15:38:50 ERROR util.Utils: Uncaught exception in thread pool-4-thread-1 java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) at java.lang.Thread.join(Thread.java:1326) at org.apache.spark.streaming.util.RecurringTimer.stop(RecurringTimer.scala:86) at org.apache.spark.streaming.scheduler.JobGenerator.stop(JobGenerator.scala:137) at org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:123) at org.apache.spark.streaming.StreamingContext$$anonfun$stop$1.apply$mcV$sp(StreamingContext.scala:681) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1321) at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:680) at org.apache.spark.streaming.StreamingContext.org$apache$spark$streaming$StreamingContext$$stopOnShutdown(StreamingContext.scala:714) at org.apache.spark.streaming.StreamingContext$$anonfun$start$1.apply$mcV$sp(StreamingContext.scala:599) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1952) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [2019-06-10 15:38:53.092]Container exited with a non-zero exit code 15. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : 0 (TID 58165, xptcxochapp104, executor 37): TaskKilled (Stage cancelled) 19/06/10 15:38:47 INFO storage.BlockManagerInfo: Added broadcast_59468_piece0 in memory on xptcxochapp104:46869 (size: 40.1 KB, free: 8.8 GB) 19/06/10 15:38:48 INFO cluster.YarnClusterScheduler: Removed TaskSet 93.0, whose tasks have all completed, from pool default 19/06/10 15:38:48 INFO scheduler.TaskSetManager: Starting task 12.0 in stage 94.0 (TID 58173, xptcxochapp104, executor 54, partition 12, NODE_LOCAL, 5019 bytes) 19/06/10 15:38:48 WARN scheduler.TaskSetManager: Lost task 61.0 in stage 93.0 (TID 58164, xptcxochapp104, executor 54): TaskKilled (Stage cancelled) 19/06/10 15:38:48 INFO cluster.YarnClusterScheduler: Removed TaskSet 93.0, whose tasks have all completed, from pool default 19/06/10 15:38:48 INFO storage.BlockManagerInfo: Added broadcast_59469_piece0 in memory on xptcxochapp104:38142 (size: 40.1 KB, free: 8.8 GB) 19/06/10 15:38:50 INFO scheduler.TaskSetManager: Starting task 13.0 in stage 94.0 (TID 58174, xptcxochapp104, executor 53, partition 13, NODE_LOCAL, 5019 bytes) 19/06/10 15:38:50 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 94.0 (TID 58166) in 5689 ms on xptcxochapp104 (executor 53) (1/728) 19/06/10 15:38:50 INFO storage.BlockManagerInfo: Added broadcast_59470_piece0 in memory on xptcxochapp104:38899 (size: 40.1 KB, free: 8.8 GB) 19/06/10 15:38:50 WARN util.ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:68) 19/06/10 15:38:50 ERROR util.Utils: Uncaught exception in thread pool-4-thread-1 java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) at java.lang.Thread.join(Thread.java:1326) at org.apache.spark.streaming.util.RecurringTimer.stop(RecurringTimer.scala:86) at org.apache.spark.streaming.scheduler.JobGenerator.stop(JobGenerator.scala:137) at org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:123) at org.apache.spark.streaming.StreamingContext$$anonfun$stop$1.apply$mcV$sp(StreamingContext.scala:681) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1321) at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:680) at org.apache.spark.streaming.StreamingContext.org$apache$spark$streaming$StreamingContext$$stopOnShutdown(StreamingContext.scala:714) at org.apache.spark.streaming.StreamingContext$$anonfun$start$1.apply$mcV$sp(StreamingContext.scala:599) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1952) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) For more detailed output, check the application tracking page: http://xptcxochapp102:8088/cluster/app/application_1548676780185_0067 Then click on links to logs of each attempt. Spark Submit parameters: spark-submit \ --master yarn \ --deploy-mode cluster \ --num-executors 24 \ --executor-cores 2 \ --driver-memory 10G \ --executor-memory 15G \ --conf "spark.cassandra.output.consistency.level=ANY" \ --conf "spark.cassandra.input.consistency.level=ONE" \ --conf "spark.yarn.executor.memoryOverhead=3G" \ --conf "spark.yarn.driver.memoryOverhead=3G" \ --conf "spark.scheduler.mode=FAIR" \ --conf "spark.cassandra.connection.host={IPs}" \ --conf "spark.streaming.fileStream.minRememberDuration=300s" \ --conf "spark.network.timeout=500s" \ --conf "spark.cassandra.connection.timeout_ms=600000" \ --conf "spark.executor.heartbeatInterval=20s" \ --conf "spark.yarn.maxAppAttempts=2" \ --conf "spark.yarn.am.attemptFailuresValidityInterval=1h" \ --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=20" \ --conf "spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=20" \ --conf "spark.streaming.stopGracefullyOnShutdown=true"
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN