Support Questions

Find answers, ask questions, and share your expertise

How To Best Resolve - RMStateStore FENCED?

avatar
Super Collaborator

When doing a restart of all services after Kerberos setup, we ran into the following exception:

File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'yarn resourcemanager -format-state-store' returned 255. 15/10/26 16:11:16 INFO resourcemanager.ResourceManager: STARTUP_MSG:

15/10/26 16:11:17 INFO recovery.ZKRMStateStore: org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread thread interrupted! Exiting!
15/10/26 16:11:17 INFO zookeeper.ZooKeeper: Session: 0x150a4b3429b0002 closed
15/10/26 16:11:17 FATAL resourcemanager.ResourceManager: Error starting ResourceManager
org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = Directory not empty for /rmstore/ZKRMStateRoot/RMAppRoot
                at org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
                at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
                at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
                at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.recursiveDeleteWithRetriesHelper(ZKRMStateStore.java:1049)
                at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.recursiveDeleteWithRetriesHelper(ZKRMStateStore.java:1045)
                at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.access$500(ZKRMStateStore.java:89)
                at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$10.run(ZKRMStateStore.java:1032)
                at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$10.run(ZKRMStateStore.java:1029)
                at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1104)
                at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1125)
                at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.deleteWithRetries(ZKRMStateStore.java:1029)
                at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.deleteStore(ZKRMStateStore.java:825)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.deleteRMStateStore(ResourceManager.java:1267)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1190)
15/10/26 16:11:17 INFO zookeeper.ClientCnxn: EventThread shut down
15/10/26 16:11:17 INFO resourcemanager.ResourceManager: SHUTDOWN_MSG:

What are the root cause of this and how to best resovle/avoid this from happening?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@hkropp@hortonworks.com

FATAL resourcemanager.ResourceManager:Error starting ResourceManager

org.apache.zookeeper.KeeperException$NotEmptyException:KeeperErrorCode=Directory not empty for /rmstore/ZKRMStateRoot/RMAppRoot

Please see this. In my case, I have all the application data sitting under that particular location

[zk: localhost:2181(CONNECTED) 2] ls /rmstore/ZKRMStateRoot/RMAppRoot

[application_1445593412630_0002, application_1445593412630_0001, application_1445366030467_0002, application_1445366030467_0001, application_1445366030467_0004, application_1445366030467_0003, application_1445593412630_0006, application_1445366030467_0005, application_1445593412630_0005, application_1445593412630_0004, application_1445593412630_0003, application_1445173693339_0006, application_1445173693339_0005, application_1445173693339_0004, application_1445173693339_0003, application_1445173693339_0002, application_1445173693339_0001, application_1445394313024_0004, application_1445394313024_0003, application_1445394313024_0002, application_1445394313024_0001, application_1445394313024_0008, application_1445394313024_0007, application_1445394313024_0006, application_1445394313024_0005]

[zk: localhost:2181(CONNECTED) 3] quit

Quitting...

[zk: localhost:2181(CONNECTED) 3] rmr /rmstore/ZKRMStateRoot/RMAppRoot

[zk: localhost:2181(CONNECTED) 4] ls /rmstore/ZKRMStateRoot/RMAppRoot

Node does not exist: /rmstore/ZKRMStateRoot/RMAppRoot

Restart Yarn and I got the location back

[zk: localhost:2181(CONNECTED) 6] ls /rmstore/ZKRMStateRoot/RMAppRoot

[]

[zk: localhost:2181(CONNECTED) 7]

[zk: localhost:2181(CONNECTED) 7] ls /rmstore/ZKRMStateRoot

[AMRMTokenSecretManagerRoot, RMAppRoot, EpochNode, RMDTSecretManagerRoot, RMVersionNode]

[zk: localhost:2181(CONNECTED) 8]

You can try this but if you are not sure or its prod then open support ticket.

View solution in original post

4 REPLIES 4

avatar
Master Mentor

@hkropp@hortonworks.com

FATAL resourcemanager.ResourceManager:Error starting ResourceManager

org.apache.zookeeper.KeeperException$NotEmptyException:KeeperErrorCode=Directory not empty for /rmstore/ZKRMStateRoot/RMAppRoot

Please see this. In my case, I have all the application data sitting under that particular location

[zk: localhost:2181(CONNECTED) 2] ls /rmstore/ZKRMStateRoot/RMAppRoot

[application_1445593412630_0002, application_1445593412630_0001, application_1445366030467_0002, application_1445366030467_0001, application_1445366030467_0004, application_1445366030467_0003, application_1445593412630_0006, application_1445366030467_0005, application_1445593412630_0005, application_1445593412630_0004, application_1445593412630_0003, application_1445173693339_0006, application_1445173693339_0005, application_1445173693339_0004, application_1445173693339_0003, application_1445173693339_0002, application_1445173693339_0001, application_1445394313024_0004, application_1445394313024_0003, application_1445394313024_0002, application_1445394313024_0001, application_1445394313024_0008, application_1445394313024_0007, application_1445394313024_0006, application_1445394313024_0005]

[zk: localhost:2181(CONNECTED) 3] quit

Quitting...

[zk: localhost:2181(CONNECTED) 3] rmr /rmstore/ZKRMStateRoot/RMAppRoot

[zk: localhost:2181(CONNECTED) 4] ls /rmstore/ZKRMStateRoot/RMAppRoot

Node does not exist: /rmstore/ZKRMStateRoot/RMAppRoot

Restart Yarn and I got the location back

[zk: localhost:2181(CONNECTED) 6] ls /rmstore/ZKRMStateRoot/RMAppRoot

[]

[zk: localhost:2181(CONNECTED) 7]

[zk: localhost:2181(CONNECTED) 7] ls /rmstore/ZKRMStateRoot

[AMRMTokenSecretManagerRoot, RMAppRoot, EpochNode, RMDTSecretManagerRoot, RMVersionNode]

[zk: localhost:2181(CONNECTED) 8]

You can try this but if you are not sure or its prod then open support ticket.

avatar
Master Mentor

@hkropp Please see this

avatar
Master Mentor

@hkropp

Can we test the following answer?

avatar
Master Mentor

@hkropp has this been resolved? Can you post your solution or accept best answer?