- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Spark job fails without much information
- Labels:
-
Apache Spark
Created on
‎08-20-2019
01:35 PM
- last edited on
‎08-21-2019
02:42 AM
by
VidyaSargur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ,
Has anyone come across, below error and can share a common cause since the error message looks very generic:
ERROR Lost executor 12 on host123: Container marked as failed: container_e218_1564435356568_349499_01_000013 on host: host123. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Created ‎08-20-2019 03:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Prav
You're right, it's pretty generic, but this usually occurs if your containers were killed due to memory issues. This can either be a java.lang.OutOfMemorError thrown by the executor running in that container, or possibly the container's JVM process' physical memory grew beyond its memory limits. Meaning, if your application was configured with 1 gb of executor memory (spark.executor.memory) and 1 g for executor memory overhead (spark.executor.memoryOverhead), then the container size request here would be 2 gb. If the process' memory goes beyond 2 gb then YARN is going to kill that process.
Really, the best way of identifying the issue is by collecting the YARN logs for your application and going through that:
yarn logs -applicationId 1564435356568_349499
You would just run that from your edge node or NodeManager machines (assuming you're running Spark on YARN).
Created ‎08-21-2019 11:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks that does show more information.
Though what i find weird is the same query has run with a large load earlier (with same config params) and now has failed (from the logs: java.lang.OutOfMemoryError: Java heap space).
Regards
