About zintan

zintan · ‎03-06-2023

Hey Vidya, thanks for the response. Please let me know if any further information is required regarding resolving the issue

zintan · ‎03-05-2023

Hi team, I'm facing the below-mentioned issue. executorLostFailure (executor 6 exited unrel+details ExecutorLostFailure (executor 6 exited unrelated to the running tasks) Reason: Container marked as failed: container_1677979357691_0001 on host: xyz Exit status: -100. Diagnostics: Container released on a *lost* node. Previously I'm using 18 r5.4xlarge machines, so now I want to move it to 5 r5.16xlarge machines. So I made a few config changes I'm using m5.2xlarge as a master in both the cases I didn't change the EBS volume size. Previously and now also it is 950G spark.executor.cores: "5" spark.driver.cores: "5" spark.executor.memory: "35G" spark.driver.memory: "30G" spark.executor.instances: "60" spark.dynamicAllocation.enabled: "false" spark.executor.memoryOverhead: "6G" spark.excludeOnFailure.enabled: "true" spark.excludeOnFailure.enabled: "true" spark.excludeOnFailure.killExcludedExecutors: "true" spark.excludeOnFailure.application.fetchFailure.enabled: "true" spark.excludeOnFailure.application.maxFailedTasksPerExecutor: "4" spark.excludeOnFailure.application.maxFailedExecutorsPerNode: "5" spark.excludeOnFailure.application.maxFailedExecutors: "6" spark.sql.shuffle.partitions: "2000" spark.executor.heartbeatInterval: "120s" spark.network.timeout: "2400s" But few jobs are failing with the above mentioned error. Please help. I'm completely lost

Online	Offline
Last Visited	‎03-23-2023 04:17 AM

Member Since	‎03-05-2023 02:25 PM
Last Visited	‎03-23-2023 04:17 AM
Posts	2

Cloudera Community

Re: executorLostFailure (executor 6 exited unrel+d...

How to correct executorLostFailure Reason: Contai...