28785
DISCUSSIONS
102009
MEMBERS
3160
ARTICLES
Created on 10-24-2017 07:40 AM - edited 09-16-2022 05:26 AM
Hello,
We encountered a very rare error when trying to debug critical YARN health issue: Failed to run MapReduce job to aggregate YARN container usage metrics.
The job that is gathering these metrics is supposed to be run by the user cmjobuser but this user is missing in the system (as per the log below):
2017-10-24 12:34:18,147 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1508843544385_0003_02_000001 startLocalizer is : 255 ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:260) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1138) 2017-10-24 12:34:18,148 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command provided 0 2017-10-24 12:34:18,148 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : run as user is cmjobuser 2017-10-24 12:34:18,148 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : requested yarn user is cmjobuser 2017-10-24 12:34:18,148 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: User cmjobuser not found 2017-10-24 12:34:18,148 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localizer failed java.io.IOException: Application application_1508843544385_0003 initialization failed (exitCode=255) with output: main : command provided 0 main : run as user is cmjobuser main : requested yarn user is cmjobuser User cmjobuser not found at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:269) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1138) Caused by: ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:260) ... 1 more
I've checked and such user is not present neither in the OS, nor in Kerberos.
What can be the root cause of this? When is this user created?
Can we create it manually?
Best regards,
Dominik