Support Questions

VidyaSargur · ‎08-20-2019

Hi ,

Has anyone come across, below error and can share a common cause since the error message looks very generic:

ERROR Lost executor 12 on host123: Container marked as failed: container_e218_1564435356568_349499_01_000013 on host: host123. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

w@leed · ‎08-20-2019

Hi @Prav

You're right, it's pretty generic, but this usually occurs if your containers were killed due to memory issues. This can either be a java.lang.OutOfMemorError thrown by the executor running in that container, or possibly the container's JVM process' physical memory grew beyond its memory limits. Meaning, if your application was configured with 1 gb of executor memory (spark.executor.memory) and 1 g for executor memory overhead (spark.executor.memoryOverhead), then the container size request here would be 2 gb. If the process' memory goes beyond 2 gb then YARN is going to kill that process.

Really, the best way of identifying the issue is by collecting the YARN logs for your application and going through that:

yarn logs -applicationId 1564435356568_349499

You would just run that from your edge node or NodeManager machines (assuming you're running Spark on YARN).

Prav · ‎08-21-2019

Thanks that does show more information.

Though what i find weird is the same query has run with a large load earlier (with same config params) and now has failed (from the logs: java.lang.OutOfMemoryError: Java heap space).

Regards

Cloudera Community

Support Questions

Spark job fails without much information