Reply
Highlighted
AKB
Contributor
Posts: 55
Registered: ‎04-11-2018

CDH 5.15 shows RM down right after install using CM

[ Edited ]

Log is attached. Just installed cluster and hit this issue. Not happened before in pre 5.15 releases.

 

Thanks.

 

EDIT: Can't seem to attach logs

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hadoop-yarn/fail. Name node is in safe mode.
The reported blocks 745 has reached the threshold 0.9990 of total blocks 745. The number of live datanodes 1 has reached the minimum number 1. In safe mode extension. Safe mode will be turned off automatically in 1 seconds.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1529)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4527)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4502)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:884)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:328)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:641)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275)

at org.apache.hadoop.ipc.Client.call(Client.java:1504)
at org.apache.hadoop.ipc.Client.call(Client.java:1441)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy90.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:575)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy91.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3155)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:3122)
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1005)
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1001)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1001)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:993)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1970)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.writeFlagFileForFailedAM(RMAppImpl.java:1352)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$3500(RMAppImpl.java:111)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$AttemptFailedFinalStateSavedTransition.transition(RMAppImpl.java:1036)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$AttemptFailedFinalStateSavedTransition.transition(RMAppImpl.java:1028)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$FinalStateSavedTransition.transition(RMAppImpl.java:1017)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$FinalStateSavedTransition.transition(RMAppImpl.java:1011)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:766)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:110)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:868)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:852)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
2018-08-12 22:16:04,534 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1534112156808_0001 failed 2 times due to AM Container for appattempt_1534112156808_0001_000002 exited with exitCode: 143
For more detailed output, check application tracking page:http://ip-172-31-28-114.ec2.internal:8088/proxy/application_1534112156808_0001/Then, click on links to logs of each attempt.
Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
Failing this attempt. Failing the application.
2018-08-12 22:16:04,536 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1534112156808_0001 State change from FINAL_SAVING to FAILED on event = APP_UPDATE_SAVED
2018-08-12 22:16:04,537 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dr.who OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1534112156808_0001 failed 2 times due to AM Container for appattempt_1534112156808_0001_000002 exited with exitCode: 143
For more detailed output, check application tracking page:http://ip-172-31-28-114.ec2.internal:8088/proxy/application_1534112156808_0001/Then, click on links to logs of each attempt.
Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
Failing this attempt. Failing the application. APPID=application_1534112156808_0001
2018-08-12 22:16:04,539 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1534112156808_0001,name=hadoop,user=dr.who,queue=root.users.dr_dot_who,state=FAILED,trackingUrl=http://ip-172-31-28-114.ec2.internal:8088/cluster/app/application_1534112156808_0001,appMasterHost=N..., vCores:0>
2018-08-12 22:16:52,666 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 2
2018-08-12 22:16:53,015 WARN org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The specific max attempts: 0 for application: 2 is invalid, because it is out of the range [1, 2]. Use the global max attempts instead.
2018-08-12 22:16:53,015 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with id 2 submitted by user dr.who
2018-08-12 22:16:53,016 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dr.who OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1534112156808_0002
2018-08-12 22:16:53,016 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing application with id application_1534112156808_0002
2018-08-12 22:16:53,016 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1534112156808_0002
2018-08-12 22:16:53,016 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1534112156808_0002 State change from NEW to NEW_SAVING on event = START
2018-08-12 22:16:53,017 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1534112156808_0002 State change from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED
2018-08-12 22:16:53,017 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule: Name dr.who is converted to dr_dot_who when it is used as a queue name.
2018-08-12 22:16:53,017 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Accepted application application_1534112156808_0002 from user: dr.who, in queue: root.users.dr_dot_who, currently num of applications: 1
2018-08-12 22:16:53,019 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1534112156808_0002 State change from SUBMITTED to ACCEPTED on event = APP_ACCEPTED
2018-08-12 22:16:53,019 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1534112156808_0002_000001
2018-08-12 22:16:53,019 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1534112156808_0002_000001 State change from NEW to SUBMITTED on event = START
2018-08-12 22:16:53,020 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Added Application Attempt appattempt_1534112156808_0002_000001 to scheduler from user: dr.who
2018-08-12 22:16:53,020 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1534112156808_0002_000001 State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED
2018-08-12 22:16:53,287 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1534112156808_0002_01_000001 Container Transitioned from NEW to ALLOCATED
2018-08-12 22:16:53,287 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dr.who OPERATION=AM Allocated Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1534112156808_0002 CONTAINERID=container_1534112156808_0002_01_000001
2018-08-12 22:16:53,288 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_1534112156808_0002_01_000001 of capacity <memory:1024, vCores:1> on host ip-172-31-28-114.ec2.internal:8041, which has 1 containers, <memory:1024, vCores:1> used and <memory:1846, vCores:7> available after allocation
2018-08-12 22:16:53,288 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Sending NMToken for nodeId : ip-172-31-28-114.ec2.internal:8041 for container : container_1534112156808_0002_01_000001
2018-08-12 22:16:53,291 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1534112156808_0002_01_000001 Container Transitioned from ALLOCATED to ACQUIRED
2018-08-12 22:16:53,292 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Clear node set for appattempt_1534112156808_0002_000001
2018-08-12 22:16:53,292 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Storing attempt: AppId: application_1534112156808_0002 AttemptId: appattempt_1534112156808_0002_000001 MasterContainer: Container: [ContainerId: container_1534112156808_0002_01_000001, NodeId: ip-172-31-28-114.ec2.internal:8041, NodeHttpAddress: ip-172-31-28-114.ec2.internal:8042, Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 172.31.28.114:8041 }, ]
2018-08-12 22:16:53,292 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1534112156808_0002_000001 State change from SCHEDULED to ALLOCATED_SAVING on event = CONTAINER_ALLOCATED
2018-08-12 22:16:53,298 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1534112156808_0002_000001 State change from ALLOCATED_SAVING to ALLOCATED on event = ATTEMPT_NEW_SAVED
2018-08-12 22:16:53,299 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching masterappattempt_1534112156808_0002_000001
2018-08-12 22:16:53,302 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting up container Container: [ContainerId: container_1534112156808_0002_01_000001, NodeId: ip-172-31-28-114.ec2.internal:8041, NodeHttpAddress: ip-172-31-28-114.ec2.internal:8042, Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 172.31.28.114:8041 }, ] for AM appattempt_1534112156808_0002_000001
2018-08-12 22:16:53,302 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Create AMRMToken for ApplicationAttempt: appattempt_1534112156808_0002_000001
2018-08-12 22:16:53,302 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Creating password for appattempt_1534112156808_0002_000001
2018-08-12 22:16:53,315 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done launching container Container: [ContainerId: container_1534112156808_0002_01_000001, NodeId: ip-172-31-28-114.ec2.internal:8041, NodeHttpAddress: ip-172-31-28-114.ec2.internal:8042, Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 172.31.28.114:8041 }, ] for AM appattempt_1534112156808_0002_000001
2018-08-12 22:16:53,315 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1534112156808_0002_000001 State change from ALLOCATED to LAUNCHED on event = LAUNCHED
2018-08-12 22:16:53,621 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 3
2018-08-12 22:16:53,978 WARN org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The specific max attempts: 0 for application: 3 is invalid, because it is out of the range [1, 2]. Use the global max attempts instead.
2018-08-12 22:16:53,978 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with id 3 submitted by user dr.who
2018-08-12 22:16:53,978 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dr.who OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1534112156808_0003
2018-08-12 22:16:53,978 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing application with id application_1534112156808_0003
2018-08-12 22:16:53,979 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1534112156808_0003
2018-08-12 22:16:53,979 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1534112156808_0003 State change from NEW to NEW_SAVING on event = START
2018-08-12 22:16:53,979 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1534112156808_0003 State change from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED
2018-08-12 22:16:53,979 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule: Name dr.who is converted to dr_dot_who when it is used as a queue name.
2018-08-12 22:16:53,980 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Accepted application application_1534112156808_0003 from user: dr.who, in queue: root.users.dr_dot_who, currently num of applications: 2
2018-08-12 22:16:53,981 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1534112156808_0003 State change from SUBMITTED to ACCEPTED on event = APP_ACCEPTED
2018-08-12 22:16:53,981 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1534112156808_0003_000001
2018-08-12 22:16:53,981 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1534112156808_0003_000001 State change from NEW to SUBMITTED on event = START
2018-08-12 22:16:53,981 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Added Application Attempt appattempt_1534112156808_0003_000001 to scheduler from user: dr.who
2018-08-12 22:16:53,982 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1534112156808_0003_000001 State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED
2018-08-12 22:16:54,289 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1534112156808_0002_01_000001 Container Transitioned from ACQUIRED to RUNNING

Announcements