Member since
08-19-2019
4
Posts
0
Kudos Received
0
Solutions
11-20-2019
12:48 PM
I have run the id command for the user on both nodes. Both show the user is valid.
... View more
11-20-2019
12:27 PM
This is still happening for us see the error in our production yarn logs below. The issue happened again today. AM Container for appattempt_1573918482316_0389_000001 exited with exitCode: -1000 For more detailed output, check the application tracking page: http://guerlpahdp001.fg.rbc.com:8088/cluster/app/application_1573918482316_0389 Then click on links to logs of each attempt. Diagnostics: Application application_1573918482316_0389 initialization failed (exitCode=255) with output: main : command provided 0 main : run as user is ptzs0srv0z50 main : requested yarn user is ptzs0srv0z50 User ptzs0srv0z50 not found Failing this attempt
... View more
09-23-2019
01:56 PM
Facing a problem where randomly we will get "User not found" error when trying to start a container in checking namenode logs I can see the following:
2019-09-15 23:55:58,529 INFO containermanager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInternal(810)) - Start request for container_e65_1565485836435_8640_02_000003 by user pyqy0srv0z50 2019-09-15 23:55:58,529 INFO application.ApplicationImpl (ApplicationImpl.java:transition(304)) - Adding container_e65_1565485836435_8640_02_000003 to application application_1565485836435_8640 2019-09-15 23:55:58,533 INFO container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e65_1565485836435_8640_02_000003 transitioned from NEW to LOCALIZING 2019-09-15 23:55:58,533 INFO containermanager.AuxServices (AuxServices.java:handle(215)) - Got event CONTAINER_INIT for appId application_1565485836435_8640 2019-09-15 23:55:58,533 INFO yarn.YarnShuffleService (YarnShuffleService.java:initializeContainer(192)) - Initializing container container_e65_1565485836435_8640_02_000003 2019-09-15 23:55:58,533 INFO yarn.YarnShuffleService (YarnShuffleService.java:initializeContainer(284)) - Initializing container container_e65_1565485836435_8640_02_000003 2019-09-15 23:55:58,533 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource hdfs://prod/user/pyqy0srv0z50/.sparkStaging/application_1565485836435_8640/__spark_conf__.zip transitioned from INIT to DOWNLOAD ING 2019-09-15 23:55:58,533 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:handle(712)) - Created localizer for container_e65_1565485836435_8640_02_000003 2019-09-15 23:55:58,535 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:writeCredentials(1194)) - Writing credentials to the nmPrivate file /app/data/hadoop/disk14/yarn/local/nmPrivate/container_e65_15654858 36435_8640_02_000003.tokens. Credentials list: 2019-09-15 23:55:58,693 WARN privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:executePrivilegedOperation(171)) - Shell execution returned exit code: 255. Privileged Execution Operation Output: main : command provided 0 main : run as user is pyqy0srv0z50 main : requested yarn user is pyqy0srv0z50 User pyqy0srv0z50 not found Full command array for failed execution:
2019-09-15 23:55:58,694 WARN nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:startLocalizer(269)) - Exit code from container container_e65_1565485836435_8640_02_000003 startLocalizer is : 255 org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=255: at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:177) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:264) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1114) Caused by: ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand(Shell.java:944) at org.apache.hadoop.util.Shell.run(Shell.java:848) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:151) ... 2 more 2019-09-15 23:55:58,694 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:run(1134)) - Localizer failed java.io.IOException: Application application_1565485836435_8640 initialization failed (exitCode=255) with output: main : command provided 0 main : run as user is pyqy0srv0z50 main : requested yarn user is pyqy0srv0z50 User pyqy0srv0z50 not found
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:273) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1114) Caused by: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=255: at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:177) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:264) ... 1 more Caused by: ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand(Shell.java:944) at org.apache.hadoop.util.Shell.run(Shell.java:848) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:151) ... 2 more
2019-09-15 23:55:58,694 INFO container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e65_1565485836435_8640_02_000003 transitioned from LOCALIZING to LOCALIZATION_FAILED 2019-09-15 23:55:58,697 INFO container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e65_1565485836435_8640_02_000003 transitioned from LOCALIZATION_FAILED to DONE
... View more
Labels:
- Labels:
-
Apache Spark
08-21-2019
05:57 AM
Hello, I am regularly seeing this error in the state-change.log on my production cluster and was wondering if this is something to worry about.
prod.cmla.metrics-15 in response to UpdateMetadata request sent by controller 1005 epoch 84 with correlation id 1083 (state.change.logger) [2019-08-21 04:18:18,862] TRACE Controller 1005 epoch 84 received response {error_code=0} for a request sent to broker (id: 1005 rack: null) (state.change.logger) [2019-08-21 04:18:18,863] TRACE Controller 1005 epoch 84 received response {error_code=0} for a request sent to broker (id: 1002 rack: null) (state.change.logger)
... View more
Labels:
- Labels:
-
Apache Kafka