- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 10-26-2022 07:36 PM - edited 10-26-2022 07:38 PM
Symptoms:
In versions prior to CDH 6.3.1, Node Managers can enter unhealthy state with below error observed in NM logs
2022-10-20 15:31:32,487 ERROR logaggregation.AggregatedLogFormat (AggregatedLogFormat.java:logErrorMessage(299)) - Error aggregating log file. Log file : /hadoop/ssd01/yarn/log/application_1665989140069_135925/container_e93_1665989140069_135925_01_000002/history.txt.appattempt_1665989140069_135925_000001. /hadoop/ssd01/yarn/log/application_1665989140069_135925/container_e93_1665989140069_135925_01_000002/history.txt.appattempt_1665989140069_135925_000001 (Permission denied) 2022-10-20 15:28:19,556 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(532)) - Exit code: 35 2022-10-20 15:28:19,556 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(532)) - Exception message: Launch container failed 2022-10-20 15:28:19,556 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(532)) - Shell error output: Could not create container dirsCould not create local files and directories
2022-10-20 15:28:19,557 ERROR launcher.ContainerLaunch (ContainerLaunch.java:call(327)) - Failed to launch container due to configuration error. org.apache.hadoop.yarn.exceptions.ConfigurationException: Linux Container Executor reached unrecoverable exception
Permissions for yarn_nodemanager_local_dirs needs to checked and rectified if they are not correct.
The actual issue is that most of these exit codes doesn't fall under the criteria where NM should be marked unhealthy. Based on above we might hitting known Issue
https://issues.apache.org/jira/browse/YARN-8751
https://issues.apache.org/jira/browse/YARN-9833
Resolution:
1 Clear Cache and restart affected NodeManagers should resolve the issue in the older versions.
2. This issue is permanently fixed in CDH 6.3.1 and later.
YARN-9833 - Race condition when DirectoryCollection.checkDirs() runs during container launch