Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to correct executorLostFailure Reason: Container marked as failed?

avatar
New Contributor

Hi team, I'm facing the below-mentioned issue. 
executorLostFailure (executor 6 exited unrel+details ExecutorLostFailure (executor 6 exited unrelated to the running tasks) Reason: Container marked as failed: container_1677979357691_0001 on host: xyz Exit status: -100. Diagnostics: Container released on a *lost* node.

Previously I'm using 18 r5.4xlarge machines, so now I want to move it to 5 r5.16xlarge machines. So I made a few config changes
I'm using m5.2xlarge as a master in both the cases
I didn't change the EBS volume size. Previously and now also it is 950G 
spark.executor.cores: "5"
spark.driver.cores: "5"
spark.executor.memory: "35G"
spark.driver.memory: "30G"
spark.executor.instances: "60"
spark.dynamicAllocation.enabled: "false"
spark.executor.memoryOverhead: "6G"
spark.excludeOnFailure.enabled: "true"
spark.excludeOnFailure.enabled: "true"
spark.excludeOnFailure.killExcludedExecutors: "true"
spark.excludeOnFailure.application.fetchFailure.enabled: "true"
spark.excludeOnFailure.application.maxFailedTasksPerExecutor: "4"
spark.excludeOnFailure.application.maxFailedExecutorsPerNode: "5"
spark.excludeOnFailure.application.maxFailedExecutors: "6"
spark.sql.shuffle.partitions: "2000"
spark.executor.heartbeatInterval: "120s"
spark.network.timeout: "2400s"

But few jobs are failing with the above mentioned error. Please help. I'm completely lost 


2 REPLIES 2

avatar
Community Manager

@zintan

Welcome to our community! To help you get the best possible answer, I have tagged our Spark experts @RangaReddy @steven-matison @smdas  who may be able to assist you further.

 

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
New Contributor

Hey Vidya, thanks for the response. Please let me know if any further information is required regarding resolving the issue