Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How To Best Resolve - RMStateStore FENCED?

avatar
Super Collaborator

When doing a restart of all services after Kerberos setup, we ran into the following exception:

File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'yarn resourcemanager -format-state-store' returned 255. 15/10/26 16:11:16 INFO resourcemanager.ResourceManager: STARTUP_MSG:

15/10/26 16:11:17 INFO recovery.ZKRMStateStore: org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread thread interrupted! Exiting!
15/10/26 16:11:17 INFO zookeeper.ZooKeeper: Session: 0x150a4b3429b0002 closed
15/10/26 16:11:17 FATAL resourcemanager.ResourceManager: Error starting ResourceManager
org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = Directory not empty for /rmstore/ZKRMStateRoot/RMAppRoot
                at org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
                at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
                at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
                at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.recursiveDeleteWithRetriesHelper(ZKRMStateStore.java:1049)
                at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.recursiveDeleteWithRetriesHelper(ZKRMStateStore.java:1045)
                at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.access$500(ZKRMStateStore.java:89)
                at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$10.run(ZKRMStateStore.java:1032)
                at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$10.run(ZKRMStateStore.java:1029)
                at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1104)
                at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1125)
                at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.deleteWithRetries(ZKRMStateStore.java:1029)
                at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.deleteStore(ZKRMStateStore.java:825)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.deleteRMStateStore(ResourceManager.java:1267)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1190)
15/10/26 16:11:17 INFO zookeeper.ClientCnxn: EventThread shut down
15/10/26 16:11:17 INFO resourcemanager.ResourceManager: SHUTDOWN_MSG:

What are the root cause of this and how to best resovle/avoid this from happening?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@hkropp@hortonworks.com

FATAL resourcemanager.ResourceManager:Error starting ResourceManager

org.apache.zookeeper.KeeperException$NotEmptyException:KeeperErrorCode=Directory not empty for /rmstore/ZKRMStateRoot/RMAppRoot

Please see this. In my case, I have all the application data sitting under that particular location

[zk: localhost:2181(CONNECTED) 2] ls /rmstore/ZKRMStateRoot/RMAppRoot

[application_1445593412630_0002, application_1445593412630_0001, application_1445366030467_0002, application_1445366030467_0001, application_1445366030467_0004, application_1445366030467_0003, application_1445593412630_0006, application_1445366030467_0005, application_1445593412630_0005, application_1445593412630_0004, application_1445593412630_0003, application_1445173693339_0006, application_1445173693339_0005, application_1445173693339_0004, application_1445173693339_0003, application_1445173693339_0002, application_1445173693339_0001, application_1445394313024_0004, application_1445394313024_0003, application_1445394313024_0002, application_1445394313024_0001, application_1445394313024_0008, application_1445394313024_0007, application_1445394313024_0006, application_1445394313024_0005]

[zk: localhost:2181(CONNECTED) 3] quit

Quitting...

[zk: localhost:2181(CONNECTED) 3] rmr /rmstore/ZKRMStateRoot/RMAppRoot

[zk: localhost:2181(CONNECTED) 4] ls /rmstore/ZKRMStateRoot/RMAppRoot

Node does not exist: /rmstore/ZKRMStateRoot/RMAppRoot

Restart Yarn and I got the location back

[zk: localhost:2181(CONNECTED) 6] ls /rmstore/ZKRMStateRoot/RMAppRoot

[]

[zk: localhost:2181(CONNECTED) 7]

[zk: localhost:2181(CONNECTED) 7] ls /rmstore/ZKRMStateRoot

[AMRMTokenSecretManagerRoot, RMAppRoot, EpochNode, RMDTSecretManagerRoot, RMVersionNode]

[zk: localhost:2181(CONNECTED) 8]

You can try this but if you are not sure or its prod then open support ticket.

View solution in original post

4 REPLIES 4

avatar
Master Mentor

@hkropp@hortonworks.com

FATAL resourcemanager.ResourceManager:Error starting ResourceManager

org.apache.zookeeper.KeeperException$NotEmptyException:KeeperErrorCode=Directory not empty for /rmstore/ZKRMStateRoot/RMAppRoot

Please see this. In my case, I have all the application data sitting under that particular location

[zk: localhost:2181(CONNECTED) 2] ls /rmstore/ZKRMStateRoot/RMAppRoot

[application_1445593412630_0002, application_1445593412630_0001, application_1445366030467_0002, application_1445366030467_0001, application_1445366030467_0004, application_1445366030467_0003, application_1445593412630_0006, application_1445366030467_0005, application_1445593412630_0005, application_1445593412630_0004, application_1445593412630_0003, application_1445173693339_0006, application_1445173693339_0005, application_1445173693339_0004, application_1445173693339_0003, application_1445173693339_0002, application_1445173693339_0001, application_1445394313024_0004, application_1445394313024_0003, application_1445394313024_0002, application_1445394313024_0001, application_1445394313024_0008, application_1445394313024_0007, application_1445394313024_0006, application_1445394313024_0005]

[zk: localhost:2181(CONNECTED) 3] quit

Quitting...

[zk: localhost:2181(CONNECTED) 3] rmr /rmstore/ZKRMStateRoot/RMAppRoot

[zk: localhost:2181(CONNECTED) 4] ls /rmstore/ZKRMStateRoot/RMAppRoot

Node does not exist: /rmstore/ZKRMStateRoot/RMAppRoot

Restart Yarn and I got the location back

[zk: localhost:2181(CONNECTED) 6] ls /rmstore/ZKRMStateRoot/RMAppRoot

[]

[zk: localhost:2181(CONNECTED) 7]

[zk: localhost:2181(CONNECTED) 7] ls /rmstore/ZKRMStateRoot

[AMRMTokenSecretManagerRoot, RMAppRoot, EpochNode, RMDTSecretManagerRoot, RMVersionNode]

[zk: localhost:2181(CONNECTED) 8]

You can try this but if you are not sure or its prod then open support ticket.

avatar
Master Mentor

@hkropp Please see this

avatar
Master Mentor

@hkropp

Can we test the following answer?

avatar
Master Mentor

@hkropp has this been resolved? Can you post your solution or accept best answer?