<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: YARN - Zookeeper failing a few moments after restart in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236858#M198671</link>
    <description>&lt;P&gt;&lt;A rel="noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer nofollow noopener noreferrer" href="http://ray%20teruya/" target="_blank"&gt;&lt;EM&gt;@Ray Teruya&lt;/EM&gt;&lt;/A&gt;&lt;EM&gt; &lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;OutOfMemoryError is a subclass of java.lang.VirtualMachineError; it’s thrown by the JVM when it encounters a problem related to utilizing resources. More specifically, &lt;STRONG&gt;the error occurs when the JVM spent too much time performing Garbage Collection&lt;/STRONG&gt; and was only able to reclaim very little heap space.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="110185-1564828294767.png" style="width: 878px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/14329i4EEFD6EC15FFE60C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="110185-1564828294767.png" alt="110185-1564828294767.png" /&gt;&lt;/span&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;According to Java docs, by default, the JVM is configured to throw this error if the Java process spends more than 98% of its time doing GC and when only less than 2% of the heap is recovered in each run. In other words, this means that our application has exhausted nearly all the available memory and the Garbage Collector has spent too much time trying to clean it and failed repeatedly.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;In this situation, users experience extreme slowness of the application. Certain operations, which usually complete in milliseconds, take more time to complete. This is because the CPU is using its entire capacity for Garbage Collection and hence cannot perform any other tasks.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Solution:&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;On HDP 3.x &amp;amp;  2.6.x depending on the memory available to the cluster check and increase the below &lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="110193-1564829131326.png" style="width: 769px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/14330i016F68787196EF67/image-size/medium?v=v2&amp;amp;px=400" role="button" title="110193-1564829131326.png" alt="110193-1564829131326.png" /&gt;&lt;/span&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;You could throttle it to 2048 MB&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;HTH&lt;/EM&gt;&lt;/P&gt;</description>
    <pubDate>Sat, 17 Aug 2019 23:26:06 GMT</pubDate>
    <dc:creator>Shelton</dc:creator>
    <dc:date>2019-08-17T23:26:06Z</dc:date>
    <item>
      <title>YARN - Zookeeper failing a few moments after restart</title>
      <link>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236851#M198664</link>
      <description>&lt;P&gt;Good morning guys, thanks in advance for your help!&lt;BR /&gt;&lt;BR /&gt;I have a project that fails. I'm trying to restart all the services manually but havent been able to. &lt;BR /&gt;I have a few questions and I'd really appreciate if you can give me some guidance because at this moment I'm kinda stuck.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;1. How do I check what services need to be "up and running" before restarting the next one? Is there any place where I can see the dependency?&lt;BR /&gt;2. Do I need 2 ZooKeeper servers up and running? The first one is running in localhost but the 2nd one runs in a different machine. If I actually need them both, how can I check what was wrong in the second one?&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="110104-ambarierrors.png" style="width: 596px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/14333i82EC0916B07DA173/image-size/medium?v=v2&amp;amp;px=400" role="button" title="110104-ambarierrors.png" alt="110104-ambarierrors.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 17 Aug 2019 23:26:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236851#M198664</guid>
      <dc:creator>ray_teruya</dc:creator>
      <dc:date>2019-08-17T23:26:30Z</dc:date>
    </item>
    <item>
      <title>Re: YARN - Zookeeper failing a few moments after restart</title>
      <link>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236852#M198665</link>
      <description>&lt;P&gt;&lt;A rel="noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer nofollow noopener noreferrer" href="http://@Ray%20Teruya" target="_blank"&gt;&lt;EM&gt;@Ray Teruya&lt;/EM&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;Start-all-services-from-Ambari&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Start all services. Use &lt;STRONG&gt;Ambari&lt;/STRONG&gt; UI &amp;gt; &lt;STRONG&gt;Services&lt;/STRONG&gt; &amp;gt;&lt;STRONG&gt; Start All &lt;/STRONG&gt;to &lt;STRONG&gt;start all services&lt;/STRONG&gt; at once. In Ambari UI &amp;gt; Services you can start, stop, and restart all listed services simultaneously. In &lt;STRONG&gt;Services&lt;/STRONG&gt;, click ... and then click &lt;STRONG&gt;Start All&lt;/STRONG&gt;.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="110131-taruya.png" style="width: 366px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/14332iCD686563E7D11B54/image-size/medium?v=v2&amp;amp;px=400" role="button" title="110131-taruya.png" alt="110131-taruya.png" /&gt;&lt;/span&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;The first place to check for start failures or success in &lt;STRONG&gt;/var/logs/zookeeper/zookeeper.log&lt;/STRONG&gt; or &lt;STRONG&gt;zookeeper-zookeeper-server-[hostname].out&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;According to HWX documentation &lt;A rel="noopener noreferrer noopener noreferrer nofollow noopener noreferrer" href="https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/administration/content/starting_hdp_services.htm" target="_blank"&gt;make sure to manually start the Hadoop services in this prescribed order&lt;/A&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;1. How do I check what services need to be "up and running" before restarting the next one? Is there any place where I can see the dependency? &lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;The above gives you the list and order of dependency&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;2. Do I need 2 ZooKeeper servers up and running? The first one is running in localhost but the 2nd one runs in a different machine. If I actually need them both, how can I check what was wrong in the second one?&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;If you are not run an HA configuration a single zookeeper suffice, but if you want to emulate a production environment with many data nodes to enable [HA Namenode or RM] you MUST have at least 3 zookeepers to avoid the split-brain phenomenon&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Hope that helps&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 17 Aug 2019 23:26:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236852#M198665</guid>
      <dc:creator>Shelton</dc:creator>
      <dc:date>2019-08-17T23:26:22Z</dc:date>
    </item>
    <item>
      <title>Re: YARN - Zookeeper failing a few moments after restart</title>
      <link>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236853#M198666</link>
      <description>&lt;P&gt;The above question was originally posted in the &lt;A href="https://community.hortonworks.com/spaces/101/index.html"&gt;Community Help&lt;/A&gt; track. On Wed Jul 31 03:08 UTC 2019, a member of the HCC moderation staff moved it to the &lt;A href="https://community.hortonworks.com/spaces/61/operations-track_2.html"&gt;Cloud &amp;amp; Operations&lt;/A&gt; track. The &lt;EM&gt;Community Help Track&lt;/EM&gt; is intended for questions about using the HCC site itself, not for technical questions about using &lt;A rel="noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer" href="https://hortonworks.com/apache/ambari/" target="_blank"&gt;Ambari&lt;/A&gt;.&lt;/P&gt;</description>
      <pubDate>Wed, 31 Jul 2019 10:11:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236853#M198666</guid>
      <dc:creator>ask_bill_brooks</dc:creator>
      <dc:date>2019-07-31T10:11:11Z</dc:date>
    </item>
    <item>
      <title>Re: YARN - Zookeeper failing a few moments after restart</title>
      <link>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236854#M198667</link>
      <description>&lt;P&gt;&lt;A rel="noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer" href="http://ray%20teruya/" target="_blank"&gt;&lt;EM&gt;@Ray Teruya&lt;/EM&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;How many hosts do you have in your cluster?  Can you share your zookeeper logs and your /etc/hosts? &lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;HTH&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 31 Jul 2019 20:36:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236854#M198667</guid>
      <dc:creator>Shelton</dc:creator>
      <dc:date>2019-07-31T20:36:55Z</dc:date>
    </item>
    <item>
      <title>Re: YARN - Zookeeper failing a few moments after restart</title>
      <link>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236855#M198668</link>
      <description>&lt;P&gt;Thank again &lt;A rel="user" href="https://community.cloudera.com/users/1271/sheltong.html" nodeid="1271"&gt;@Geoffrey Shelton Okot&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I have 4 machines in the cluster (1 master 3 slaves)&lt;BR /&gt;Zookeeper server is installed in the master (works fine) and in one slave (fails)&lt;BR /&gt;&lt;BR /&gt;This is the log I get from the slave&lt;BR /&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/110136-zookeeperlog.txt"&gt;zookeeperLOG.txt&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 01 Aug 2019 00:32:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236855#M198668</guid>
      <dc:creator>ray_teruya</dc:creator>
      <dc:date>2019-08-01T00:32:10Z</dc:date>
    </item>
    <item>
      <title>Re: YARN - Zookeeper failing a few moments after restart</title>
      <link>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236856#M198669</link>
      <description>&lt;P&gt;Thanks &lt;A rel="user" href="https://community.cloudera.com/users/1271/sheltong.html" target="_blank"&gt;@Geoffrey Shelton Okot&lt;/A&gt; for your help!&lt;/P&gt;&lt;P&gt;I did restart all services manually but seems that ZK still fails. From the screenshot I posted, one of my ZK servers is always down. Since ZK needs to be up and running before anything else, I'd like to fix this issue before anything else. I checked the error message from Ambari and it says&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="110143-zk-server-failing.png" style="width: 1075px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/14331iAB1BDF8C1558B9A1/image-size/medium?v=v2&amp;amp;px=400" role="button" title="110143-zk-server-failing.png" alt="110143-zk-server-failing.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;The error message says&lt;BR /&gt;Connection failed: [Errno 111] Connection refused to ip_zookeeper_server2:2181&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;What else can I do to fix this?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;update:&lt;/STRONG&gt;&lt;BR /&gt;This is what the log file shows on that machine&lt;BR /&gt;&lt;BR /&gt;2019-07-31 07:57:58,187 - INFO  [main:QuorumPeerConfig@103] - Reading configuration from: /usr/hdp/current/zookeeper-server/conf/zoo.cfg&lt;BR /&gt;2019-07-31 07:57:58,191 - WARN  [main:QuorumPeerConfig@291] - No server failure will be tolerated. You need at least 3 servers.&lt;BR /&gt;2019-07-31 07:57:58,191 - INFO  [main:QuorumPeerConfig@338] - Defaulting to majority quorums&lt;BR /&gt;2019-07-31 07:57:58,196 - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 30&lt;BR /&gt;2019-07-31 07:57:58,197 - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 24&lt;BR /&gt;2019-07-31 07:57:58,198 - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.&lt;BR /&gt;2019-07-31 07:57:58,210 - INFO  [main:QuorumPeerMain@127] - Starting quorum peer&lt;BR /&gt;2019-07-31 07:57:58,219 - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.&lt;BR /&gt;2019-07-31 07:57:58,223 - INFO  [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181&lt;BR /&gt;2019-07-31 07:57:58,233 - INFO  [main:QuorumPeer@992] - tickTime set to 2000&lt;BR /&gt;2019-07-31 07:57:58,233 - INFO  [main:QuorumPeer@1012] - minSessionTimeout set to -1&lt;BR /&gt;2019-07-31 07:57:58,233 - INFO  [main:QuorumPeer@1023] - maxSessionTimeout set to -1&lt;BR /&gt;2019-07-31 07:57:58,233 - INFO  [main:QuorumPeer@1038] - initLimit set to 10&lt;BR /&gt;2019-07-31 07:57:58,245 - INFO  [main:FileSnap@83] - Reading snapshot /hadoop/zookeeper/version-2/snapshot.8600bc40ab&lt;BR /&gt;2019-07-31 07:58:41,800 - ERROR [main:NIOServerCnxnFactory$1@44] - Thread Thread[main,5,main] died&lt;BR /&gt;java.lang.OutOfMemoryError: GC overhead limit exceeded&lt;BR /&gt;        at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:97)&lt;BR /&gt;        at org.apache.zookeeper.server.DataNode.deserialize(DataNode.java:158)&lt;BR /&gt;        at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)&lt;BR /&gt;        at org.apache.zookeeper.server.DataTree.deserialize(DataTree.java:1194)&lt;BR /&gt;        at org.apache.zookeeper.server.util.SerializeUtils.deserializeSnapshot(SerializeUtils.java:127)&lt;BR /&gt;        at org.apache.zookeeper.server.persistence.FileSnap.deserialize(FileSnap.java:127)&lt;BR /&gt;        at org.apache.zookeeper.server.persistence.FileSnap.deserialize(FileSnap.java:87)&lt;BR /&gt;        at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)&lt;BR /&gt;        at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)&lt;BR /&gt;        at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:483)&lt;BR /&gt;        at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:473)&lt;BR /&gt;        at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)&lt;BR /&gt;        at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)&lt;BR /&gt;        at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 17 Aug 2019 23:26:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236856#M198669</guid>
      <dc:creator>ray_teruya</dc:creator>
      <dc:date>2019-08-17T23:26:14Z</dc:date>
    </item>
    <item>
      <title>Re: YARN - Zookeeper failing a few moments after restart</title>
      <link>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236857#M198670</link>
      <description>&lt;DIV class="fr-view clearfix"&gt;&lt;P&gt;After checking today's log file I found this.&lt;BR /&gt;Will google it to see what it means&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;2019-07-31 07:57:58,187 - INFO  [main:QuorumPeerConfig@103] - Reading configuration from: /usr/hdp/current/zookeeper-server/conf/zoo.cfg&lt;BR /&gt;2019-07-31 07:57:58,191 - WARN  [main:QuorumPeerConfig@291] - No server failure will be tolerated. You need at least 3 servers.&lt;BR /&gt;2019-07-31 07:57:58,191 - INFO  [main:QuorumPeerConfig@338] - Defaulting to majority quorums&lt;BR /&gt;2019-07-31 07:57:58,196 - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 30&lt;BR /&gt;2019-07-31 07:57:58,197 - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 24&lt;BR /&gt;2019-07-31 07:57:58,198 - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.&lt;BR /&gt;2019-07-31 07:57:58,210 - INFO  [main:QuorumPeerMain@127] - Starting quorum peer&lt;BR /&gt;2019-07-31 07:57:58,219 - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.&lt;BR /&gt;2019-07-31 07:57:58,223 - INFO  [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181&lt;BR /&gt;2019-07-31 07:57:58,233 - INFO  [main:QuorumPeer@992] - tickTime set to 2000&lt;BR /&gt;2019-07-31 07:57:58,233 - INFO  [main:QuorumPeer@1012] - minSessionTimeout set to -1&lt;BR /&gt;2019-07-31 07:57:58,233 - INFO  [main:QuorumPeer@1023] - maxSessionTimeout set to -1&lt;BR /&gt;2019-07-31 07:57:58,233 - INFO  [main:QuorumPeer@1038] - initLimit set to 10&lt;BR /&gt;2019-07-31 07:57:58,245 - INFO  [main:FileSnap@83] - Reading snapshot /hadoop/zookeeper/version-2/snapshot.8600bc40ab&lt;BR /&gt;2019-07-31 07:58:41,800 - ERROR [main:NIOServerCnxnFactory$1@44] - Thread Thread[main,5,main] died&lt;BR /&gt;java.lang.OutOfMemoryError: GC overhead limit exceeded&lt;BR /&gt;        at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:97)&lt;BR /&gt;        at org.apache.zookeeper.server.DataNode.deserialize(DataNode.java:158)&lt;BR /&gt;        at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)&lt;BR /&gt;        at org.apache.zookeeper.server.DataTree.deserialize(DataTree.java:1194)&lt;BR /&gt;        at org.apache.zookeeper.server.util.SerializeUtils.deserializeSnapshot(SerializeUtils.java:127)&lt;BR /&gt;        at org.apache.zookeeper.server.persistence.FileSnap.deserialize(FileSnap.java:127)&lt;BR /&gt;        at org.apache.zookeeper.server.persistence.FileSnap.deserialize(FileSnap.java:87)&lt;BR /&gt;        at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)&lt;BR /&gt;        at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)&lt;BR /&gt;        at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:483)&lt;BR /&gt;        at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:473)&lt;BR /&gt;        at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)&lt;BR /&gt;        at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)&lt;BR /&gt;        at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)&lt;/P&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 01 Aug 2019 02:21:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236857#M198670</guid>
      <dc:creator>ray_teruya</dc:creator>
      <dc:date>2019-08-01T02:21:21Z</dc:date>
    </item>
    <item>
      <title>Re: YARN - Zookeeper failing a few moments after restart</title>
      <link>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236858#M198671</link>
      <description>&lt;P&gt;&lt;A rel="noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer nofollow noopener noreferrer" href="http://ray%20teruya/" target="_blank"&gt;&lt;EM&gt;@Ray Teruya&lt;/EM&gt;&lt;/A&gt;&lt;EM&gt; &lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;OutOfMemoryError is a subclass of java.lang.VirtualMachineError; it’s thrown by the JVM when it encounters a problem related to utilizing resources. More specifically, &lt;STRONG&gt;the error occurs when the JVM spent too much time performing Garbage Collection&lt;/STRONG&gt; and was only able to reclaim very little heap space.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="110185-1564828294767.png" style="width: 878px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/14329i4EEFD6EC15FFE60C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="110185-1564828294767.png" alt="110185-1564828294767.png" /&gt;&lt;/span&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;According to Java docs, by default, the JVM is configured to throw this error if the Java process spends more than 98% of its time doing GC and when only less than 2% of the heap is recovered in each run. In other words, this means that our application has exhausted nearly all the available memory and the Garbage Collector has spent too much time trying to clean it and failed repeatedly.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;In this situation, users experience extreme slowness of the application. Certain operations, which usually complete in milliseconds, take more time to complete. This is because the CPU is using its entire capacity for Garbage Collection and hence cannot perform any other tasks.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Solution:&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;On HDP 3.x &amp;amp;  2.6.x depending on the memory available to the cluster check and increase the below &lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="110193-1564829131326.png" style="width: 769px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/14330i016F68787196EF67/image-size/medium?v=v2&amp;amp;px=400" role="button" title="110193-1564829131326.png" alt="110193-1564829131326.png" /&gt;&lt;/span&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;You could throttle it to 2048 MB&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;HTH&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 17 Aug 2019 23:26:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236858#M198671</guid>
      <dc:creator>Shelton</dc:creator>
      <dc:date>2019-08-17T23:26:06Z</dc:date>
    </item>
    <item>
      <title>Re: YARN - Zookeeper failing a few moments after restart</title>
      <link>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236859#M198672</link>
      <description>&lt;P&gt;&lt;A rel="noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer" href="http://ray%20teruya/" target="_blank"&gt;&lt;EM&gt;@Ray Teruya&lt;/EM&gt;&lt;/A&gt;&lt;EM&gt; &lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;The error in BOLD below is what I stated in Question/Answer 2  in my former post. To avoid the split-brain decision you MUST install 3 zookeepers&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;2019-07-31 07:57:58,191 - WARN [main:QuorumPeerConfig@291] - No server failure will be tolerated. You need at least 3 servers. &lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;&lt;U&gt;Solution&lt;/U&gt;&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Delete/remove the failed installation.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Add 2 new zk using Ambari UI in your cluster using &lt;STRONG&gt;ADD SERVICE&lt;/STRONG&gt;, start the new zookeepers if they ain't started, this should form a quorum where only one is a leader and the rest are followers. To identify a &lt;STRONG&gt;Zookeeper&lt;/STRONG&gt; leader/&lt;STRONG&gt;follower&lt;/STRONG&gt;, there are few possible options. Mentioning 2 for keeping this document simple. 2. Use "nc" command to listen to TCP communication on port 2181 and determine if the &lt;STRONG&gt;ZooKeeper&lt;/STRONG&gt; server is a leader or a &lt;STRONG&gt;follower&lt;/STRONG&gt;. &lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;1. Check the zookeeper log file on each node, and grep as below:&lt;/EM&gt;&lt;/P&gt;&lt;PRE&gt;&lt;EM&gt;# grep LEAD /var/log/zookeeper/zookeeper-zookeeper-server-xyz.out&lt;/EM&gt;&lt;/PRE&gt;&lt;P&gt;&lt;EM&gt;Desired output&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;2019-08-10 22:33:47,113 - INFO  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumPeer@829] - LEADING&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;2019-08-10 22:33:47,114 - INFO  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Leader@358] - LEADING - LEADER ELECTION TOOK - 9066&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;After doing the above procedure you should be good to go.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;HTH&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 12 Aug 2019 01:16:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236859#M198672</guid>
      <dc:creator>Shelton</dc:creator>
      <dc:date>2019-08-12T01:16:53Z</dc:date>
    </item>
    <item>
      <title>Re: YARN - Zookeeper failing a few moments after restart</title>
      <link>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236860#M198673</link>
      <description>&lt;P&gt;Wow thanks &lt;A rel="user" href="https://community.cloudera.com/users/1271/sheltong.html" nodeid="1271"&gt;@Geoffrey Shelton Okot&lt;/A&gt; and sorry for the late response. Changing the maximum memory value did the job. Now we're checking that it stays stable. So far so good! &lt;/P&gt;</description>
      <pubDate>Tue, 13 Aug 2019 22:55:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/236860#M198673</guid>
      <dc:creator>ray_teruya</dc:creator>
      <dc:date>2019-08-13T22:55:40Z</dc:date>
    </item>
    <item>
      <title>Re: YARN - Zookeeper failing a few moments after restart</title>
      <link>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/263004#M205991</link>
      <description>&lt;P&gt;&lt;STRONG&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/52240"&gt;@ray_teruya&lt;/a&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;&lt;I&gt;If you found this answer addressed your question, please take a moment to log in and click the "kudos" link on the answer.&lt;/I&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;&lt;I&gt;&amp;nbsp;&lt;/I&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;&lt;I&gt;That would be a great help to Community users to find the solution quickly for these kinds of errors.&lt;/I&gt;&lt;/FONT&gt;&lt;/DIV&gt;</description>
      <pubDate>Sun, 18 Aug 2019 08:27:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/YARN-Zookeeper-failing-a-few-moments-after-restart/m-p/263004#M205991</guid>
      <dc:creator>Shelton</dc:creator>
      <dc:date>2019-08-18T08:27:16Z</dc:date>
    </item>
  </channel>
</rss>

