Support Questions

VidyaSargur · ‎08-20-2019

Hi ,

Has anyone come across, below error and can share a common cause since the error message looks very generic:

ERROR Lost executor 12 on host123: Container marked as failed: container_e218_1564435356568_349499_01_000013 on host: host123. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

w@leed · ‎08-20-2019

Hi @Prav

You're right, it's pretty generic, but this usually occurs if your containers were killed due to memory issues. This can either be a java.lang.OutOfMemorError thrown by the executor running in that container, or possibly the container's JVM process' physical memory grew beyond its memory limits. Meaning, if your application was configured with 1 gb of executor memory (spark.executor.memory) and 1 g for executor memory overhead (spark.executor.memoryOverhead), then the container size request here would be 2 gb. If the process' memory goes beyond 2 gb then YARN is going to kill that process.

Really, the best way of identifying the issue is by collecting the YARN logs for your application and going through that:

yarn logs -applicationId 1564435356568_349499

You would just run that from your edge node or NodeManager machines (assuming you're running Spark on YARN).

Prav · ‎08-21-2019

Thanks that does show more information.

Though what i find weird is the same query has run with a large load earlier (with same config params) and now has failed (from the logs: java.lang.OutOfMemoryError: Java heap space).

Regards

Cloudera Community

Support Questions

Spark job fails without much information

Cloudera Data Engineering Spark Job with Python Wh...

Spark jobs failing

Working with CDE Spark Job Parameters in Cloudera ...

Creating a CDE Job with Spark Application Code loc...

Spark job fails with below error when byte code gr...

Starting Spark jobs directly via YARN REST API

Spark job failure after Kerberos is enabled

Spark Jobs failing - firewall issue....?

Spark job failed when new HiveContext object

Spark job getting failed with Jupyter notebook