<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question ZooKeeper Failover controller crashes when the Hadoop NameNode goes down in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/ZooKeeper-Failover-controller-crashes-when-the-Hadoop/m-p/213261#M175189</link>
    <description>&lt;P&gt;We are setting up a Hadoop Cluster in our development environment. While we were testing the fail over of the NameNode we noticed that the Zookeeper Failover controller would sometimes crash. In one case both ZooKeeper and the ZooKeeper Failover controller crashed.&lt;/P&gt;&lt;P&gt;At one point in our testing &lt;STRONG&gt;both NameNodes&lt;/STRONG&gt; were in an &lt;STRONG&gt;active&lt;/STRONG&gt; state. This would cause a &lt;STRONG&gt;split brain&lt;/STRONG&gt; scenario in the Hadoop Cluster.  &lt;/P&gt;&lt;P&gt;We have not seen any useful information in the logs.   &lt;/P&gt;&lt;P&gt;We are using the following versions: - hadoop-2.7.3 - zookeeper-3.4.10&lt;/P&gt;&lt;P&gt;We have two a four server cluster. Two of the servers are dedicated to NameNode and two of the servers are dedicated to DataNodes.&lt;/P&gt;&lt;P&gt;The components running on the NameNode servers are - NameNode - ZooKeeper - ZooKeeper Failover controller - JournalNode&lt;/P&gt;&lt;P&gt;The components running on the DataNode servers are - DataNode - ZooKeeper - JournalNode&lt;/P&gt;&lt;P&gt;The following matrix contains the test scenarios.  After the matrix we have the contents of the core-site.xml and hdfs-site.xml. &lt;/P&gt;&lt;TABLE&gt;
 
 &lt;TBODY&gt;&lt;TR&gt;
  &lt;TD&gt;&lt;STRONG&gt;NameNode
  Server 1 &lt;/STRONG&gt;&lt;/TD&gt;
  &lt;TD&gt;&lt;STRONG&gt;NameNode Server
  2&lt;/STRONG&gt;&lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;Name node 1 is
  active &lt;/TD&gt;
  &lt;TD&gt;NameNode 2 is standby&lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;Kill -9 pid NameNode 1&lt;BR /&gt;
    Name Node 1 is down&lt;/TD&gt;
  &lt;TD&gt;NameNode 2 is active&lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;Start NameNode 1&lt;BR /&gt;
    Zoo Keeper Failover 1 crashes&lt;BR /&gt;
    NameNode 1 is standby&lt;/TD&gt;
  &lt;TD&gt;NameNode 2 is active&lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;Start Zoo Keeper Failover Controller1&lt;BR /&gt;
    NameNode 1 is active&lt;/TD&gt;
  &lt;TD&gt;NameNode 2 is standby&lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;Kill -9 pid NameNode 1&lt;BR /&gt;
    Name Node 1 is down&lt;/TD&gt;
  &lt;TD&gt;NameNode 2 is active&lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;Start NameNode 1&lt;BR /&gt;
    Zoo Keeper Failover 1 crashes&lt;BR /&gt;
    NameNode 1 is standby&lt;BR /&gt;
    No useful information in the Zoo Keeper&lt;BR /&gt;
    Failover Controller Logs&lt;/TD&gt;
  &lt;TD&gt;NameNode 2 is Aactive&lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;Turn on Log4j debugging &lt;BR /&gt;
    Start Zoo Keeper Failover Controller1&lt;BR /&gt;
    Zoo Keeper Fail Over Controller does&lt;BR /&gt;
    not start
  &lt;/TD&gt;&lt;TD&gt;NameNode 2 is Aactive&lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;Turn on Log4j debugging,console&lt;BR /&gt;
    Start Zoo Keeper Failover Controller1&lt;BR /&gt;
    Zoo Keeper Fail Over Controller does&lt;BR /&gt;
    not start&lt;BR /&gt;
    Logs: unable to start failover controller Parent znode does not exist&lt;BR /&gt;
    Logs: run with -formatZK to initalize Zookeeper&lt;/TD&gt;
  &lt;TD&gt;NameNode 2 is active&lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;run: hdfs zkfc -formatZK&lt;BR /&gt;
    Start Zoo Keeper Failover Controller 1&lt;BR /&gt;
    Name node 1 is active &lt;/TD&gt;
  &lt;TD&gt;NameNode
  2 is active
 &lt;/TD&gt;&lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;Name node 1 is active &lt;/TD&gt;
  &lt;TD&gt;Stop  Name Node 2&lt;BR /&gt;
    Zoo Keeper Failover Controller 2 crashed&lt;BR /&gt;
    Zoo Keeper crashed&lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;NameNode 1 is
  standby&lt;/TD&gt;
  &lt;TD&gt;Start
  Zoo Keeper&lt;BR /&gt;
    Start Zoo Keeper Failover Controller 2&lt;BR /&gt;
    Start Name Node 2&lt;BR /&gt;
    Name Node 2 active&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;core-site.xml&lt;/P&gt;&lt;PRE&gt;&amp;lt;configuration&amp;gt;
    &amp;lt;property&amp;gt;
         &amp;lt;name&amp;gt;hadoop.tmp.dir&amp;lt;/name&amp;gt;
         &amp;lt;value&amp;gt;/opt/xpm/hadoop_tmp&amp;lt;/value&amp;gt;
         &amp;lt;description&amp;gt;A base for other temporary directories.&amp;lt;/description&amp;gt;
    &amp;lt;/property&amp;gt;
    &amp;lt;property&amp;gt;
         &amp;lt;name&amp;gt;fs.defaultFS&amp;lt;/name&amp;gt;
         &amp;lt;value&amp;gt;hdfs://ha-cluster&amp;lt;/value&amp;gt;
    &amp;lt;/property&amp;gt;
    &amp;lt;property&amp;gt;
         &amp;lt;name&amp;gt;dfs.jornalnode.edits.dir&amp;lt;/name&amp;gt;
         &amp;lt;value&amp;gt;/opt/xpm/hadoop_journal&amp;lt;/value&amp;gt;
    &amp;lt;/property&amp;gt;
&amp;lt;/configuration&amp;gt;
&lt;/PRE&gt;&lt;P&gt;hdfs-site.xml&lt;/P&gt;&lt;PRE&gt;&amp;lt;configuration&amp;gt;
&amp;lt;property&amp;gt;
    &amp;lt;name&amp;gt;dfs.namenode.name.dir&amp;lt;/name&amp;gt;
    &amp;lt;value&amp;gt;/opt/xpm/hadoop_namenode&amp;lt;/value&amp;gt;
 &amp;lt;/property&amp;gt;
 
 &amp;lt;property&amp;gt;
 &amp;lt;name&amp;gt;dfs.replication&amp;lt;/name&amp;gt;
 &amp;lt;value&amp;gt;1&amp;lt;/value&amp;gt;
 &amp;lt;/property&amp;gt;
 
 &amp;lt;property&amp;gt;
 &amp;lt;name&amp;gt;dfs.permissions&amp;lt;/name&amp;gt;
 &amp;lt;value&amp;gt;false&amp;lt;/value&amp;gt;
 &amp;lt;/property&amp;gt;
 
 &amp;lt;property&amp;gt;
 &amp;lt;name&amp;gt;dfs.nameservices&amp;lt;/name&amp;gt;
 &amp;lt;value&amp;gt;ha-cluster&amp;lt;/value&amp;gt;
 &amp;lt;/property&amp;gt;
 
 &amp;lt;property&amp;gt;
 &amp;lt;name&amp;gt;dfs.ha.namenodes.ha-cluster&amp;lt;/name&amp;gt;
 &amp;lt;value&amp;gt;nn1,nn2&amp;lt;/value&amp;gt;
 &amp;lt;/property&amp;gt;
 
 &amp;lt;property&amp;gt;
 &amp;lt;name&amp;gt;dfs.namenode.rpc-address.ha-cluster.nn1&amp;lt;/name&amp;gt;
 &amp;lt;value&amp;gt;r00j9rn0c.bnymellon.net:9000&amp;lt;/value&amp;gt;
 &amp;lt;/property&amp;gt;
 
 &amp;lt;property&amp;gt;
 &amp;lt;name&amp;gt;dfs.namenode.rpc-address.ha-cluster.nn2&amp;lt;/name&amp;gt;
 &amp;lt;value&amp;gt;r00j9sn0c.bnymellon.net:9000&amp;lt;/value&amp;gt;
 &amp;lt;/property&amp;gt;
 
 &amp;lt;property&amp;gt;
 &amp;lt;name&amp;gt;dfs.namenode.http-address.ha-cluster.nn1&amp;lt;/name&amp;gt;
 &amp;lt;value&amp;gt;r00j9rn0c.bnymellon.net:50070&amp;lt;/value&amp;gt;
 &amp;lt;/property&amp;gt;
 
 &amp;lt;property&amp;gt;
 &amp;lt;name&amp;gt;dfs.namenode.http-address.ha-cluster.nn2&amp;lt;/name&amp;gt;
 &amp;lt;value&amp;gt;r00j9sn0c.bnymellon.net:50070&amp;lt;/value&amp;gt;
 &amp;lt;/property&amp;gt;
 
 &amp;lt;property&amp;gt;
 &amp;lt;name&amp;gt;dfs.namenode.shared.edits.dir&amp;lt;/name&amp;gt; 
 &amp;lt;value&amp;gt;qjournal://r00j9rn0c.bnymellon.net:8485;r00j9sn0c.bnymellon.net;r00j9tn0c.bnymellon.net:8485;r00j9un0c.bnymellon.net:8485/ha-cluster&amp;lt;/value&amp;gt;
 &amp;lt;/property&amp;gt;
 
 &amp;lt;property&amp;gt;
 &amp;lt;name&amp;gt;dfs.client.failover.proxy.provider.ha-cluster&amp;lt;/name&amp;gt;
 &amp;lt;value&amp;gt;org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider&amp;lt;/value&amp;gt;
 &amp;lt;/property&amp;gt;
 
 &amp;lt;property&amp;gt;
 &amp;lt;name&amp;gt;dfs.ha.automatic-failover.enabled&amp;lt;/name&amp;gt;
 &amp;lt;value&amp;gt;true&amp;lt;/value&amp;gt;
 &amp;lt;/property&amp;gt;
 
 &amp;lt;property&amp;gt;
 &amp;lt;name&amp;gt;ha.zookeeper.quorum&amp;lt;/name&amp;gt;
 &amp;lt;value&amp;gt;r00j9rn0c.bnymellon.net:2181,r00j9sn0c.bnymellon.net:2181,r00j9tn0c.bnymellon.net:2181,r00j9un0c.bnymellon.net:2181&amp;lt;/value&amp;gt;
 &amp;lt;/property&amp;gt;
 
 &amp;lt;property&amp;gt;
 &amp;lt;name&amp;gt;dfs.ha.fencing.methods&amp;lt;/name&amp;gt;
 &amp;lt;value&amp;gt;sshfence&amp;lt;/value&amp;gt;
 &amp;lt;/property&amp;gt;
 
 &amp;lt;property&amp;gt;
 &amp;lt;name&amp;gt;dfs.ha.fencing.ssh.private-key-files&amp;lt;/name&amp;gt;
 &amp;lt;value&amp;gt;/users/home/pkimd1m/.ssh/id_rsa&amp;lt;/value&amp;gt;
 &amp;lt;/property&amp;gt;


&amp;lt;/configuration&amp;gt;
&lt;/PRE&gt;</description>
    <pubDate>Tue, 08 Aug 2017 03:02:09 GMT</pubDate>
    <dc:creator>albert_stark</dc:creator>
    <dc:date>2017-08-08T03:02:09Z</dc:date>
  </channel>
</rss>

