Support Questions

Find answers, ask questions, and share your expertise

ExecutorLostFailure Reason: Container killed by YARN for exceeding memory limits

avatar
Rising Star

Hi

I am using cloudera 5.7.0 . and running spark streaming application using kafka which doing some opencv operation .

some of my containers killed by Yarn with below reason :
ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 3.1 GB of 3 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead

i am using below configuarion .
spark-submit --num-executors 20 --executor-memory 2g --executor-cores 2 - --conf spark.yarn.executor.memoryOverhead=1000
.

how can i solve this issue

Regards
Prateek

7 REPLIES 7

avatar
Master Collaborator

This means the JVM took more memory than YARN thought it should. Usually this means you need to allocate more overhead, so that more memory is requested from YARN for the same size of JVM heap. See the spark.yarn.executor.memoryOverhead option, which defaults to 10% of the specified executor memory. Increase it.

avatar
New Contributor

Hey,

I am having the same issues. 

 

Spark 1.6

Cloudera Express 5.7.1

 

 

ExecutorLostFailure (executor 60 exited caused by one of the running tasks) 

Reason: Container killed by YARN for exceeding memory limits. 1.5 GB of 1.5 GB physical memory used. 

Consider boosting spark.yarn.executor.memoryOverhead.

 

I see your solution but cannot find where that is in CM.

Can you please point me where that option is in Cloudera Manager UI?

 

Thanks,

 

Marcin

avatar
Master Collaborator

This has nothing to do with CM. It has to do with your app's memory configuration. The relevant settings are right there in the error.

avatar
New Contributor

Okay,

 

So how can I increase the overhead in Jupyter Notebook?

I am not using spark-submit for this job.

And how could I find out, what are current overhead settings?

 

Thanks!

avatar
Master Collaborator

I'm not sure how you would do that. We support spark-submit and the Workbench, not Jupyter. It's clear how to configure spark-submit, and you configure the workbench with spark-defaults.conf. You can see your Spark job's config in its UI, in the environment tab.

avatar
New Contributor

Thanks!

 

spark-submit script fixed the problem!

avatar
New Contributor

Hi @srowen

 

I am using CDH 5.15.1 and running the spark-submit to train the model and save the prediction dataframe of the model to HDFS. I am facing this errors when I am trying to save the dataframe to HDFS,

 

2018-11-19 11:17:33 ERROR YarnClusterScheduler:70 - Lost executor 2 on gworker6.vcse.lab: Executor heartbeat timed out after 149836 ms
2018-11-19 11:17:33 ERROR YarnClusterScheduler:70 - Lost executor 2 on gworker6.vcse.lab: Executor heartbeat timed out after 149836 ms
2018-11-19 11:18:07 ERROR YarnClusterScheduler:70 - Lost executor 2 on gworker6.vcse.lab: Container container_1542123439491_0080_01_000004 exited from explicit termination request.
2018-11-19 11:18:07 ERROR YarnClusterScheduler:70 - Lost executor 2 on gworker6.vcse.lab: Container container_1542123439491_0080_01_000004 exited from explicit termination request.

 

I have also tried using the spark.yarn.executor.memoryOverhead which I have set that to 10% of the executor-memory mentioned in my spark-submit and still I am seeing this errors. Do you have any suggestions for this issue?

 

Spark-Submit Command:

spark-submit-with-zoo.sh --master yarn --deploy-mode cluster --num-executors 8 --executor-cores 16 --driver-memory 300g --executor-memory 400g Main_Final_auc.py 256