<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question ResourceManager crashes due to  KeeperErrorCode = ConnectionLoss in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ResourceManager-crashes-due-to-KeeperErrorCode/m-p/26980#M5745</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Recently we are experiencing RM crashes and we see the following error in the log:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause:&lt;BR /&gt;org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We also get a lot of these exceptions in the Resource Manager log:&lt;/P&gt;&lt;P&gt;java.io.IOException: Broken pipe&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at sun.nio.ch.FileDispatcherImpl.write0(Native Method)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at sun.nio.ch.IOUtil.write(IOUtil.java:65)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)&lt;BR /&gt;2015-04-30 22:53:51,669 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation.&lt;BR /&gt;org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore/ZKRMStateRoot/RMDTSecretManagerRoot/RMDelegationTokensRoot/RMDelegationToken_57967&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:999)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:996)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:996)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.removeRMDelegationTokenState(ZKRMStateStore.java:737)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.removeRMDelegationToken(RMStateStore.java:668)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:142)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:49)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:605)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Network is fine it terms of errors/packet drops CPU usage is very low on ZK servers.&lt;/P&gt;&lt;P&gt;We are using CDH 5.3.1.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you,&lt;/P&gt;&lt;P&gt;Michael.&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 09:27:49 GMT</pubDate>
    <dc:creator>Michael.Br</dc:creator>
    <dc:date>2022-09-16T09:27:49Z</dc:date>
    <item>
      <title>ResourceManager crashes due to  KeeperErrorCode = ConnectionLoss</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ResourceManager-crashes-due-to-KeeperErrorCode/m-p/26980#M5745</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Recently we are experiencing RM crashes and we see the following error in the log:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause:&lt;BR /&gt;org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We also get a lot of these exceptions in the Resource Manager log:&lt;/P&gt;&lt;P&gt;java.io.IOException: Broken pipe&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at sun.nio.ch.FileDispatcherImpl.write0(Native Method)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at sun.nio.ch.IOUtil.write(IOUtil.java:65)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)&lt;BR /&gt;2015-04-30 22:53:51,669 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation.&lt;BR /&gt;org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore/ZKRMStateRoot/RMDTSecretManagerRoot/RMDelegationTokensRoot/RMDelegationToken_57967&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:999)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:996)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:996)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.removeRMDelegationTokenState(ZKRMStateStore.java:737)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.removeRMDelegationToken(RMStateStore.java:668)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:142)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:49)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:605)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Network is fine it terms of errors/packet drops CPU usage is very low on ZK servers.&lt;/P&gt;&lt;P&gt;We are using CDH 5.3.1.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you,&lt;/P&gt;&lt;P&gt;Michael.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 09:27:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/ResourceManager-crashes-due-to-KeeperErrorCode/m-p/26980#M5745</guid>
      <dc:creator>Michael.Br</dc:creator>
      <dc:date>2022-09-16T09:27:49Z</dc:date>
    </item>
    <item>
      <title>Re: ResourceManager crashes due to  KeeperErrorCode = ConnectionLoss</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ResourceManager-crashes-due-to-KeeperErrorCode/m-p/27425#M5746</link>
      <description>&lt;P&gt;There have been a number of issues in the RM with relation to ZooKeeper connections. There is at least a couple of issue fixed in CDH 5.3.3 (YARN-3242, YARN-2992).&lt;/P&gt;&lt;P&gt;I am not sure if your case is fully covered by these fixes since we are still working on one or two fixes in this area but upgrading to CDH 5.3.3 will help with a number of these ZK issues in the RM.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Wilfred&lt;/P&gt;</description>
      <pubDate>Thu, 14 May 2015 01:38:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/ResourceManager-crashes-due-to-KeeperErrorCode/m-p/27425#M5746</guid>
      <dc:creator>Wilfred</dc:creator>
      <dc:date>2015-05-14T01:38:13Z</dc:date>
    </item>
  </channel>
</rss>

