Created 10-26-2015 07:56 PM
When doing a restart of all services after Kerberos setup, we ran into the following exception:
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of 'yarn resourcemanager -format-state-store' returned 255. 15/10/26 16:11:16 INFO resourcemanager.ResourceManager: STARTUP_MSG: 15/10/26 16:11:17 INFO recovery.ZKRMStateStore: org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread thread interrupted! Exiting! 15/10/26 16:11:17 INFO zookeeper.ZooKeeper: Session: 0x150a4b3429b0002 closed 15/10/26 16:11:17 FATAL resourcemanager.ResourceManager: Error starting ResourceManager org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = Directory not empty for /rmstore/ZKRMStateRoot/RMAppRoot at org.apache.zookeeper.KeeperException.create(KeeperException.java:125) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.recursiveDeleteWithRetriesHelper(ZKRMStateStore.java:1049) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.recursiveDeleteWithRetriesHelper(ZKRMStateStore.java:1045) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.access$500(ZKRMStateStore.java:89) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$10.run(ZKRMStateStore.java:1032) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$10.run(ZKRMStateStore.java:1029) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1104) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1125) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.deleteWithRetries(ZKRMStateStore.java:1029) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.deleteStore(ZKRMStateStore.java:825) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.deleteRMStateStore(ResourceManager.java:1267) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1190) 15/10/26 16:11:17 INFO zookeeper.ClientCnxn: EventThread shut down 15/10/26 16:11:17 INFO resourcemanager.ResourceManager: SHUTDOWN_MSG:
What are the root cause of this and how to best resovle/avoid this from happening?
Created 10-30-2015 10:08 PM
@hkropp@hortonworks.com
FATAL resourcemanager.ResourceManager:Error starting ResourceManager
org.apache.zookeeper.KeeperException$NotEmptyException:KeeperErrorCode=Directory not empty for /rmstore/ZKRMStateRoot/RMAppRoot
Please see this. In my case, I have all the application data sitting under that particular location
[zk: localhost:2181(CONNECTED) 2] ls /rmstore/ZKRMStateRoot/RMAppRoot
[application_1445593412630_0002, application_1445593412630_0001, application_1445366030467_0002, application_1445366030467_0001, application_1445366030467_0004, application_1445366030467_0003, application_1445593412630_0006, application_1445366030467_0005, application_1445593412630_0005, application_1445593412630_0004, application_1445593412630_0003, application_1445173693339_0006, application_1445173693339_0005, application_1445173693339_0004, application_1445173693339_0003, application_1445173693339_0002, application_1445173693339_0001, application_1445394313024_0004, application_1445394313024_0003, application_1445394313024_0002, application_1445394313024_0001, application_1445394313024_0008, application_1445394313024_0007, application_1445394313024_0006, application_1445394313024_0005]
[zk: localhost:2181(CONNECTED) 3] quit
Quitting...
[zk: localhost:2181(CONNECTED) 3] rmr /rmstore/ZKRMStateRoot/RMAppRoot
[zk: localhost:2181(CONNECTED) 4] ls /rmstore/ZKRMStateRoot/RMAppRoot
Node does not exist: /rmstore/ZKRMStateRoot/RMAppRoot
Restart Yarn and I got the location back
[zk: localhost:2181(CONNECTED) 6] ls /rmstore/ZKRMStateRoot/RMAppRoot
[]
[zk: localhost:2181(CONNECTED) 7]
[zk: localhost:2181(CONNECTED) 7] ls /rmstore/ZKRMStateRoot
[AMRMTokenSecretManagerRoot, RMAppRoot, EpochNode, RMDTSecretManagerRoot, RMVersionNode]
[zk: localhost:2181(CONNECTED) 8]
You can try this but if you are not sure or its prod then open support ticket.
Created 10-30-2015 10:08 PM
@hkropp@hortonworks.com
FATAL resourcemanager.ResourceManager:Error starting ResourceManager
org.apache.zookeeper.KeeperException$NotEmptyException:KeeperErrorCode=Directory not empty for /rmstore/ZKRMStateRoot/RMAppRoot
Please see this. In my case, I have all the application data sitting under that particular location
[zk: localhost:2181(CONNECTED) 2] ls /rmstore/ZKRMStateRoot/RMAppRoot
[application_1445593412630_0002, application_1445593412630_0001, application_1445366030467_0002, application_1445366030467_0001, application_1445366030467_0004, application_1445366030467_0003, application_1445593412630_0006, application_1445366030467_0005, application_1445593412630_0005, application_1445593412630_0004, application_1445593412630_0003, application_1445173693339_0006, application_1445173693339_0005, application_1445173693339_0004, application_1445173693339_0003, application_1445173693339_0002, application_1445173693339_0001, application_1445394313024_0004, application_1445394313024_0003, application_1445394313024_0002, application_1445394313024_0001, application_1445394313024_0008, application_1445394313024_0007, application_1445394313024_0006, application_1445394313024_0005]
[zk: localhost:2181(CONNECTED) 3] quit
Quitting...
[zk: localhost:2181(CONNECTED) 3] rmr /rmstore/ZKRMStateRoot/RMAppRoot
[zk: localhost:2181(CONNECTED) 4] ls /rmstore/ZKRMStateRoot/RMAppRoot
Node does not exist: /rmstore/ZKRMStateRoot/RMAppRoot
Restart Yarn and I got the location back
[zk: localhost:2181(CONNECTED) 6] ls /rmstore/ZKRMStateRoot/RMAppRoot
[]
[zk: localhost:2181(CONNECTED) 7]
[zk: localhost:2181(CONNECTED) 7] ls /rmstore/ZKRMStateRoot
[AMRMTokenSecretManagerRoot, RMAppRoot, EpochNode, RMDTSecretManagerRoot, RMVersionNode]
[zk: localhost:2181(CONNECTED) 8]
You can try this but if you are not sure or its prod then open support ticket.
Created 11-10-2015 01:08 AM
@hkropp Please see this
Created 11-05-2015 08:41 PM
Can we test the following answer?
Created 02-03-2016 01:56 AM
@hkropp has this been resolved? Can you post your solution or accept best answer?