Community Articles

Find and share helpful community-sourced technical articles.
avatar
Rising Star

Symptoms:

 

In versions prior to CDH 6.3.1, Node Managers can enter unhealthy state with below error observed in NM logs

 

2022-10-20 15:31:32,487 ERROR logaggregation.AggregatedLogFormat (AggregatedLogFormat.java:logErrorMessage(299)) - Error aggregating log file. Log file : /hadoop/ssd01/yarn/log/application_1665989140069_135925/container_e93_1665989140069_135925_01_000002/history.txt.appattempt_1665989140069_135925_000001. /hadoop/ssd01/yarn/log/application_1665989140069_135925/container_e93_1665989140069_135925_01_000002/history.txt.appattempt_1665989140069_135925_000001 (Permission denied)

2022-10-20 15:28:19,556 INFO  nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(532)) - Exit code: 35
2022-10-20 15:28:19,556 INFO  nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(532)) - Exception message: Launch container failed
2022-10-20 15:28:19,556 INFO  nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(532)) - Shell error output: Could not create container dirsCould not create local files and directories
2022-10-20 15:28:19,557 ERROR launcher.ContainerLaunch (ContainerLaunch.java:call(327)) - Failed to launch container due to configuration error.
org.apache.hadoop.yarn.exceptions.ConfigurationException: Linux Container Executor reached unrecoverable exception

 

Permissions for yarn_nodemanager_local_dirs needs to checked and rectified if they are not correct. 

 

The actual issue is that most of these exit codes doesn't fall under the criteria where NM should be marked unhealthy. Based on above we might hitting known Issue

https://issues.apache.org/jira/browse/YARN-8751

https://issues.apache.org/jira/browse/YARN-9833

 

 

Resolution:

 

1 Clear Cache and restart affected NodeManagers should resolve the issue in the older versions.

2. This issue is permanently fixed in CDH 6.3.1 and later.

https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_631_fixed_issues.ht...

YARN-9833 - Race condition when DirectoryCollection.checkDirs() runs during container launch

 

 

1,249 Views
0 Kudos