About Aishug

Aishug · ‎06-11-2019

Hello, We are running spark application on yarn. sometimes it's running fine with no delay, but sometimes we observed delay in spark processing job. find the logs below Failing this attempt.Diagnostics: [2019-06-10 15:38:53.090]Exception from container-launch. Container id: container_1548676780185_0067_56_000001 Exit code: 15 [2019-06-10 15:38:53.091]Container exited with a non-zero exit code 15. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : 0 (TID 58165, xptcxochapp104, executor 37): TaskKilled (Stage cancelled) 19/06/10 15:38:47 INFO storage.BlockManagerInfo: Added broadcast_59468_piece0 in memory on xptcxochapp104:46869 (size: 40.1 KB, free: 8.8 GB) 19/06/10 15:38:48 INFO cluster.YarnClusterScheduler: Removed TaskSet 93.0, whose tasks have all completed, from pool default 19/06/10 15:38:48 INFO scheduler.TaskSetManager: Starting task 12.0 in stage 94.0 (TID 58173, xptcxochapp104, executor 54, partition 12, NODE_LOCAL, 5019 bytes) 19/06/10 15:38:48 WARN scheduler.TaskSetManager: Lost task 61.0 in stage 93.0 (TID 58164, xptcxochapp104, executor 54): TaskKilled (Stage cancelled) 19/06/10 15:38:48 INFO cluster.YarnClusterScheduler: Removed TaskSet 93.0, whose tasks have all completed, from pool default 19/06/10 15:38:48 INFO storage.BlockManagerInfo: Added broadcast_59469_piece0 in memory on xptcxochapp104:38142 (size: 40.1 KB, free: 8.8 GB) 19/06/10 15:38:50 INFO scheduler.TaskSetManager: Starting task 13.0 in stage 94.0 (TID 58174, xptcxochapp104, executor 53, partition 13, NODE_LOCAL, 5019 bytes) 19/06/10 15:38:50 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 94.0 (TID 58166) in 5689 ms on xptcxochapp104 (executor 53) (1/728) 19/06/10 15:38:50 INFO storage.BlockManagerInfo: Added broadcast_59470_piece0 in memory on xptcxochapp104:38899 (size: 40.1 KB, free: 8.8 GB) 19/06/10 15:38:50 WARN util.ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:68) 19/06/10 15:38:50 ERROR util.Utils: Uncaught exception in thread pool-4-thread-1 java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) at java.lang.Thread.join(Thread.java:1326) at org.apache.spark.streaming.util.RecurringTimer.stop(RecurringTimer.scala:86) at org.apache.spark.streaming.scheduler.JobGenerator.stop(JobGenerator.scala:137) at org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:123) at org.apache.spark.streaming.StreamingContext$$anonfun$stop$1.apply$mcV$sp(StreamingContext.scala:681) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1321) at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:680) at org.apache.spark.streaming.StreamingContext.org$apache$spark$streaming$StreamingContext$$stopOnShutdown(StreamingContext.scala:714) at org.apache.spark.streaming.StreamingContext$$anonfun$start$1.apply$mcV$sp(StreamingContext.scala:599) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1952) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [2019-06-10 15:38:53.092]Container exited with a non-zero exit code 15. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : 0 (TID 58165, xptcxochapp104, executor 37): TaskKilled (Stage cancelled) 19/06/10 15:38:47 INFO storage.BlockManagerInfo: Added broadcast_59468_piece0 in memory on xptcxochapp104:46869 (size: 40.1 KB, free: 8.8 GB) 19/06/10 15:38:48 INFO cluster.YarnClusterScheduler: Removed TaskSet 93.0, whose tasks have all completed, from pool default 19/06/10 15:38:48 INFO scheduler.TaskSetManager: Starting task 12.0 in stage 94.0 (TID 58173, xptcxochapp104, executor 54, partition 12, NODE_LOCAL, 5019 bytes) 19/06/10 15:38:48 WARN scheduler.TaskSetManager: Lost task 61.0 in stage 93.0 (TID 58164, xptcxochapp104, executor 54): TaskKilled (Stage cancelled) 19/06/10 15:38:48 INFO cluster.YarnClusterScheduler: Removed TaskSet 93.0, whose tasks have all completed, from pool default 19/06/10 15:38:48 INFO storage.BlockManagerInfo: Added broadcast_59469_piece0 in memory on xptcxochapp104:38142 (size: 40.1 KB, free: 8.8 GB) 19/06/10 15:38:50 INFO scheduler.TaskSetManager: Starting task 13.0 in stage 94.0 (TID 58174, xptcxochapp104, executor 53, partition 13, NODE_LOCAL, 5019 bytes) 19/06/10 15:38:50 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 94.0 (TID 58166) in 5689 ms on xptcxochapp104 (executor 53) (1/728) 19/06/10 15:38:50 INFO storage.BlockManagerInfo: Added broadcast_59470_piece0 in memory on xptcxochapp104:38899 (size: 40.1 KB, free: 8.8 GB) 19/06/10 15:38:50 WARN util.ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:68) 19/06/10 15:38:50 ERROR util.Utils: Uncaught exception in thread pool-4-thread-1 java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) at java.lang.Thread.join(Thread.java:1326) at org.apache.spark.streaming.util.RecurringTimer.stop(RecurringTimer.scala:86) at org.apache.spark.streaming.scheduler.JobGenerator.stop(JobGenerator.scala:137) at org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:123) at org.apache.spark.streaming.StreamingContext$$anonfun$stop$1.apply$mcV$sp(StreamingContext.scala:681) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1321) at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:680) at org.apache.spark.streaming.StreamingContext.org$apache$spark$streaming$StreamingContext$$stopOnShutdown(StreamingContext.scala:714) at org.apache.spark.streaming.StreamingContext$$anonfun$start$1.apply$mcV$sp(StreamingContext.scala:599) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1952) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) For more detailed output, check the application tracking page: http://xptcxochapp102:8088/cluster/app/application_1548676780185_0067 Then click on links to logs of each attempt. Spark Submit parameters: spark-submit \ --master yarn \ --deploy-mode cluster \ --num-executors 24 \ --executor-cores 2 \ --driver-memory 10G \ --executor-memory 15G \ --conf "spark.cassandra.output.consistency.level=ANY" \ --conf "spark.cassandra.input.consistency.level=ONE" \ --conf "spark.yarn.executor.memoryOverhead=3G" \ --conf "spark.yarn.driver.memoryOverhead=3G" \ --conf "spark.scheduler.mode=FAIR" \ --conf "spark.cassandra.connection.host={IPs}" \ --conf "spark.streaming.fileStream.minRememberDuration=300s" \ --conf "spark.network.timeout=500s" \ --conf "spark.cassandra.connection.timeout_ms=600000" \ --conf "spark.executor.heartbeatInterval=20s" \ --conf "spark.yarn.maxAppAttempts=2" \ --conf "spark.yarn.am.attemptFailuresValidityInterval=1h" \ --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=20" \ --conf "spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=20" \ --conf "spark.streaming.stopGracefullyOnShutdown=true"

Online	Offline
Last Visited	‎05-04-2020 05:25 AM

Member Since	‎06-11-2019 12:05 AM
Last Visited	‎05-04-2020 05:25 AM
Posts	1

Cloudera Community

AM Container for appattempt_id exited with exitCod...