<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Name Node going down due to QJM timeout in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Name-Node-going-down-due-to-QJM-timeout/m-p/162168#M36755</link>
    <description>&lt;P&gt;I have a cluster with 10 nodes and each node having 2 TB diskspace  and 250GB RAM. While writing 1TB data, namenode goes down [ HA NameNode ] with below error.  I have ran this multiple time and everytime, it is the same issue.&lt;/P&gt;&lt;P&gt;016-08-03 05:56:43,002 WARN  client.QuorumJournalManager (IPCLoggerChannel.java:call(406)) - Took 8783ms to send a batch of 4 edits (711 bytes) to remote journal 172.27.27.0:8485&lt;/P&gt;&lt;P&gt;2016-08-03 05:56:43,005 WARN  client.QuorumJournalManager (IPCLoggerChannel.java:call(388)) - Remote journal 172.27.29.0:8485 failed to write txns 330736-330807. Will try to write to this JN again after the next log roll.&lt;/P&gt;&lt;P&gt;org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 33 is less than the last promised epoch 34&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:428)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:456)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:351)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:152)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)&lt;/P&gt;&lt;P&gt;at java.security.AccessController.doPrivileged(Native Method)&lt;/P&gt;&lt;P&gt;at javax.security.auth.Subject.doAs(Subject.java:422)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)&lt;/P&gt;</description>
    <pubDate>Wed, 03 Aug 2016 13:45:41 GMT</pubDate>
    <dc:creator>sgowda</dc:creator>
    <dc:date>2016-08-03T13:45:41Z</dc:date>
    <item>
      <title>Name Node going down due to QJM timeout</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Name-Node-going-down-due-to-QJM-timeout/m-p/162168#M36755</link>
      <description>&lt;P&gt;I have a cluster with 10 nodes and each node having 2 TB diskspace  and 250GB RAM. While writing 1TB data, namenode goes down [ HA NameNode ] with below error.  I have ran this multiple time and everytime, it is the same issue.&lt;/P&gt;&lt;P&gt;016-08-03 05:56:43,002 WARN  client.QuorumJournalManager (IPCLoggerChannel.java:call(406)) - Took 8783ms to send a batch of 4 edits (711 bytes) to remote journal 172.27.27.0:8485&lt;/P&gt;&lt;P&gt;2016-08-03 05:56:43,005 WARN  client.QuorumJournalManager (IPCLoggerChannel.java:call(388)) - Remote journal 172.27.29.0:8485 failed to write txns 330736-330807. Will try to write to this JN again after the next log roll.&lt;/P&gt;&lt;P&gt;org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 33 is less than the last promised epoch 34&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:428)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:456)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:351)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:152)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)&lt;/P&gt;&lt;P&gt;at java.security.AccessController.doPrivileged(Native Method)&lt;/P&gt;&lt;P&gt;at javax.security.auth.Subject.doAs(Subject.java:422)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)&lt;/P&gt;&lt;P&gt;at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)&lt;/P&gt;</description>
      <pubDate>Wed, 03 Aug 2016 13:45:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Name-Node-going-down-due-to-QJM-timeout/m-p/162168#M36755</guid>
      <dc:creator>sgowda</dc:creator>
      <dc:date>2016-08-03T13:45:41Z</dc:date>
    </item>
    <item>
      <title>Re: Name Node going down due to QJM timeout</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Name-Node-going-down-due-to-QJM-timeout/m-p/162169#M36756</link>
      <description>&lt;P&gt;So this is what I did , since the datanode and zookeeper was writing to the same disk, the zookeeper writes was slowing down, due to which all the services dependent on zookeeper was going down.&lt;/P&gt;&lt;P&gt;Soln: Brought down the datanode's on the zookeeper machines and started the job -- This has solved the problem for now.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Aug 2016 13:58:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Name-Node-going-down-due-to-QJM-timeout/m-p/162169#M36756</guid>
      <dc:creator>sgowda</dc:creator>
      <dc:date>2016-08-04T13:58:41Z</dc:date>
    </item>
  </channel>
</rss>

