<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: What is the HDFS, NameNode configuration for the JournalManager Timeout? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/What-is-the-HDFS-NameNode-configuration-for-the/m-p/167222#M129554</link>
    <description>&lt;P&gt;I believe it is using dfs.qjournal.start-segment.timeout.ms . Default for this is 20000.&lt;/P&gt;&lt;P&gt;However there are other configs as well that you may have to adjust like dfs.qjournal.write-txns.timeout.ms.&lt;/P&gt;&lt;P&gt;But, you are better off fixing your infrastructure issues than changes these default values. &lt;/P&gt;</description>
    <pubDate>Wed, 25 May 2016 22:45:34 GMT</pubDate>
    <dc:creator>ravi1</dc:creator>
    <dc:date>2016-05-25T22:45:34Z</dc:date>
    <item>
      <title>What is the HDFS, NameNode configuration for the JournalManager Timeout?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/What-is-the-HDFS-NameNode-configuration-for-the/m-p/167221#M129553</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/738/markpetronic.html" nodeid="738"&gt;@Mark Petronic&lt;/A&gt; and I are building out a QA and Production HA HDP 2.3.4.7 clusters.  Our QA cluster is entirely on VMWare virtual machines.&lt;/P&gt;&lt;P&gt;We are having some problems with the underlying infrastructure that causes hosts to freeze for, at times up to 30 - 45 seconds.  Yes, this is a totally separate problem and beyond the scope of the Hortonworks Community.&lt;/P&gt;&lt;P&gt;However, what I am trying to do is up the NameNode processes time out from 20000ms to see if we can alleviate this problem for the time-being.&lt;/P&gt;&lt;P&gt;What ends up happening is that once the NameNode times out attempting to connect to a quorum of JournalManager processes, it just shuts down.&lt;/P&gt;&lt;PRE&gt;2016-05-25 01:46:16,480 INFO  client.QuorumJournalManager (QuorumCall.java:waitFor(136)) - Waited 6001 ms (timeout=20000 ms) for a response for startLogSegment(416426). No responses yet.
2016-05-25 01:46:26,577 WARN  client.QuorumJournalManager (QuorumCall.java:waitFor(134)) - Waited 16098 ms (timeout=20000 ms) for a response for startLogSegment(416426). No responses yet.
2016-05-25 01:46:27,578 WARN  client.QuorumJournalManager (QuorumCall.java:waitFor(134)) - Waited 17099 ms (timeout=20000 ms) for a response for startLogSegment(416426). No responses yet.
2016-05-25 01:46:28,580 WARN  client.QuorumJournalManager (QuorumCall.java:waitFor(134)) - Waited 18100 ms (timeout=20000 ms) for a response for startLogSegment(416426). No responses yet.
2016-05-25 01:46:29,580 WARN  client.QuorumJournalManager (QuorumCall.java:waitFor(134)) - Waited 19101 ms (timeout=20000 ms) for a response for startLogSegment(416426). No responses yet.
2016-05-25 01:46:30,480 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: starting log segment 416426 failed for required journal (JournalAndStream(mgr=QJM to [172.19.64.30:8485, 172.19.64.31:8485, 172.19.64.32:8485], stream=null))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
        at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
        at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.startLogSegment(QuorumJournalManager.java:403)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.startLogSegment(JournalSet.java:107)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet$3.apply(JournalSet.java:222)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.startLogSegment(JournalSet.java:219)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.startLogSegment(FSEditLog.java:1237)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1206)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1297)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:5939)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1186)
        at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142)
        at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
2016-05-25 01:46:30,483 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2016-05-25 01:46:30,487 INFO  provider.AuditProviderFactory (AuditProviderFactory.java:run(454)) - ==&amp;gt; JVMShutdownHook.run()
2016-05-25 01:46:30,487 INFO  provider.AuditProviderFactory (AuditProviderFactory.java:run(459)) - &amp;lt;== JVMShutdownHook.run()
2016-05-25 01:46:30,492 INFO  namenode.NameNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at nn01.qa.quasar.local/172.19.64.30
************************************************************/


&lt;/PRE&gt;&lt;P&gt;Digging through the documentation I thought that the configuration was ipc.client.connect.timeout in core-site.xml, but that does not seem to be the case.&lt;/P&gt;&lt;P&gt;Does anyone know what the configuration parameter is, in which config file that I can update from the 20000ms default?&lt;/P&gt;</description>
      <pubDate>Wed, 25 May 2016 20:36:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/What-is-the-HDFS-NameNode-configuration-for-the/m-p/167221#M129553</guid>
      <dc:creator>rchapin</dc:creator>
      <dc:date>2016-05-25T20:36:10Z</dc:date>
    </item>
    <item>
      <title>Re: What is the HDFS, NameNode configuration for the JournalManager Timeout?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/What-is-the-HDFS-NameNode-configuration-for-the/m-p/167222#M129554</link>
      <description>&lt;P&gt;I believe it is using dfs.qjournal.start-segment.timeout.ms . Default for this is 20000.&lt;/P&gt;&lt;P&gt;However there are other configs as well that you may have to adjust like dfs.qjournal.write-txns.timeout.ms.&lt;/P&gt;&lt;P&gt;But, you are better off fixing your infrastructure issues than changes these default values. &lt;/P&gt;</description>
      <pubDate>Wed, 25 May 2016 22:45:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/What-is-the-HDFS-NameNode-configuration-for-the/m-p/167222#M129554</guid>
      <dc:creator>ravi1</dc:creator>
      <dc:date>2016-05-25T22:45:34Z</dc:date>
    </item>
    <item>
      <title>Re: What is the HDFS, NameNode configuration for the JournalManager Timeout?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/What-is-the-HDFS-NameNode-configuration-for-the/m-p/167223#M129555</link>
      <description>&lt;P&gt;You are absolutely correct, that fixing the infrastructure issues is the correct solution, however doing so requires working with a number of other teams and will take quite some time to get sorted out.  Luckily, it is in QA, so we can live with it.&lt;/P&gt;&lt;P&gt;Thank you very much for the hint. It seems that there are a number of properties that define how the NameNodes manage their various types of connections and timeouts to the JouralManagers.&lt;/P&gt;&lt;P&gt;The following is from org.apache.hadoop.hdfs.DFSConfigKeys.java&lt;/P&gt;&lt;PRE&gt;// Quorum-journal timeouts for various operations. Unlikely to need
// to be tweaked, but configurable just in case.
public static final String DFS_QJOURNAL_START_SEGMENT_TIMEOUT_KEY = "dfs.qjournal.start-segment.timeout.ms";
public static final String DFS_QJOURNAL_PREPARE_RECOVERY_TIMEOUT_KEY = "dfs.qjournal.prepare-recovery.timeout.ms";
public static final String DFS_QJOURNAL_ACCEPT_RECOVERY_TIMEOUT_KEY = "dfs.qjournal.accept-recovery.timeout.ms";
public static final String DFS_QJOURNAL_FINALIZE_SEGMENT_TIMEOUT_KEY = "dfs.qjournal.finalize-segment.timeout.ms";
public static final String DFS_QJOURNAL_SELECT_INPUT_STREAMS_TIMEOUT_KEY = "dfs.qjournal.select-input-streams.timeout.ms";
public static final String DFS_QJOURNAL_GET_JOURNAL_STATE_TIMEOUT_KEY = "dfs.qjournal.get-journal-state.timeout.ms";
public static final String DFS_QJOURNAL_NEW_EPOCH_TIMEOUT_KEY = "dfs.qjournal.new-epoch.timeout.ms";
public static final String DFS_QJOURNAL_WRITE_TXNS_TIMEOUT_KEY = "dfs.qjournal.write-txns.timeout.ms";
public static final int DFS_QJOURNAL_START_SEGMENT_TIMEOUT_DEFAULT = 20000;
public static final int DFS_QJOURNAL_PREPARE_RECOVERY_TIMEOUT_DEFAULT = 120000;
public static final int DFS_QJOURNAL_ACCEPT_RECOVERY_TIMEOUT_DEFAULT = 120000;
public static final int DFS_QJOURNAL_FINALIZE_SEGMENT_TIMEOUT_DEFAULT = 120000;
public static final int DFS_QJOURNAL_SELECT_INPUT_STREAMS_TIMEOUT_DEFAULT = 20000;
public static final int DFS_QJOURNAL_GET_JOURNAL_STATE_TIMEOUT_DEFAULT = 120000;
public static final int DFS_QJOURNAL_NEW_EPOCH_TIMEOUT_DEFAULT = 120000;
public static final int DFS_QJOURNAL_WRITE_TXNS_TIMEOUT_DEFAULT = 20000;
&lt;/PRE&gt;&lt;P&gt;In my case, I added the following custom properties to hdfs-site.xml&lt;/P&gt;&lt;PRE&gt;dfs.qjournal.start-segment.timeout.ms = 90000
dfs.qjournal.select-input-streams.timeout.ms = 90000
dfs.qjournal.write-txns.timeout.ms = 90000
&lt;/PRE&gt;&lt;P&gt;I also added the following property to core-site.xml&lt;/P&gt;&lt;PRE&gt;ipc.client.connect.timeout = 90000
&lt;/PRE&gt;&lt;P&gt;So far, that seems to have alleviated the problem.&lt;/P&gt;</description>
      <pubDate>Thu, 26 May 2016 22:21:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/What-is-the-HDFS-NameNode-configuration-for-the/m-p/167223#M129555</guid>
      <dc:creator>rchapin</dc:creator>
      <dc:date>2016-05-26T22:21:20Z</dc:date>
    </item>
  </channel>
</rss>

