Support Questions
Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Spark job fails without much information

Rising Star

Hi ,


Has anyone come across, below error and can share a common cause since the error message looks very generic:


ERROR Lost executor 12 on host123: Container marked as failed: container_e218_1564435356568_349499_01_000013 on host: host123. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143


Cloudera Employee

Hi @Prav 

You're right, it's pretty generic, but this usually occurs if your containers were killed due to memory issues. This can either be a java.lang.OutOfMemorError thrown by the executor running in that container, or possibly the container's JVM process' physical memory grew beyond its memory limits. Meaning, if your application was configured with 1 gb of executor memory (spark.executor.memory) and 1 g for executor memory overhead (spark.executor.memoryOverhead), then the container size request here would be 2 gb. If the process' memory goes beyond 2 gb then YARN is going to kill that process. 

Really, the best way of identifying the issue is by collecting the YARN logs for your application and going through that:

yarn logs -applicationId 1564435356568_349499

You would just run that from your edge node or NodeManager machines (assuming you're running Spark on YARN).

Rising Star

Thanks that does show more information.


Though what i find weird is the same query has run with a large load earlier (with same config params) and now has failed (from the logs: java.lang.OutOfMemoryError: Java heap space).