Created on 05-05-2016 12:33 AM - edited 09-16-2022 03:17 AM
Today , we have seen the RM crashed and threw the following error message. There are bunch of JIRA tickets related to that error . One of my job is killed but the application is running in orphaned mode. The app_id is displaying in RM-UI.
I am unable to kill that App_id using yarn -application <app_id> . I restarted the RM and ZK but unable to remove that from displaying in RM -UI. It is not consuming any resources. How do I remove it from displaying ?
t: maxCompletedAppsInMemory = 10000, removing app application_1452798563961_0971 from memory: 2016-05-04 19:00:30,449 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1193)) - Null container completed... 2016-05-04 19:00:30,568 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1193)) - Null container completed... 2016-05-04 19:00:31,251 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1193)) - Null container completed... 2016-05-04 19:00:32,252 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1193)) - Null container completed... 2016-05-04 19:00:45,325 FATAL resourcemanager.ResourceManager (ResourceManager.java:handle(753)) - Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.io.IOException: Wait for ZKClient creation timed out at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1073) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1097) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:934) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeRMDelegationTokenAndSequenceNumberState(ZKRMStateStore.java:734) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.storeRMDelegationTokenAndSequenceNumber(RMStateStore.java:650) at org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager.storeNewToken(RMDelegationTokenSecretManager.java:112) at org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager.storeNewToken(RMDelegationTokenSecretManager.java:49) at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.storeToken(AbstractDelegationTokenSecretManager.java:272) at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:391) at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:47) at org.apache.hadoop.security.token.Token.<init>(Token.java:59) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getDelegationToken(ClientRMService.java:907) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getDelegationToken(ApplicationClientProtocolPBServiceImpl.java:291) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
Created 11-20-2016 05:28 AM
Can you try to delete it using rest api. pls find sample link below -
curl -v -X PUT -d '{"state": "KILLED"}''http://localhost:8088/ws/v1/cluster/apps/application_xxxxxxxx_xxxx'
Created 11-20-2016 05:28 AM
Can you try to delete it using rest api. pls find sample link below -
curl -v -X PUT -d '{"state": "KILLED"}''http://localhost:8088/ws/v1/cluster/apps/application_xxxxxxxx_xxxx'
Created 11-20-2016 09:00 AM
I will recommend to contact hortonworks support for such cases.