Support Questions

Find answers, ask questions, and share your expertise

Spark job fails without much information

avatar
Contributor

Hi ,

 

Has anyone come across, below error and can share a common cause since the error message looks very generic:

 

ERROR Lost executor 12 on host123: Container marked as failed: container_e218_1564435356568_349499_01_000013 on host: host123. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

2 REPLIES 2

avatar
Rising Star

Hi @Prav 

You're right, it's pretty generic, but this usually occurs if your containers were killed due to memory issues. This can either be a java.lang.OutOfMemorError thrown by the executor running in that container, or possibly the container's JVM process' physical memory grew beyond its memory limits. Meaning, if your application was configured with 1 gb of executor memory (spark.executor.memory) and 1 g for executor memory overhead (spark.executor.memoryOverhead), then the container size request here would be 2 gb. If the process' memory goes beyond 2 gb then YARN is going to kill that process. 

Really, the best way of identifying the issue is by collecting the YARN logs for your application and going through that:

yarn logs -applicationId 1564435356568_349499


You would just run that from your edge node or NodeManager machines (assuming you're running Spark on YARN).

avatar
Contributor

Thanks that does show more information.

 

Though what i find weird is the same query has run with a large load earlier (with same config params) and now has failed (from the logs: java.lang.OutOfMemoryError: Java heap space).

 

Regards