Reply
New Contributor
Posts: 3
Registered: ‎12-20-2013

HA Namenode Failover Issues

I am running CDH 4.3 managed via cloudera manager 5 and experiencing failovers multiple times per day. How can I increase the time of 5000 millis below?

 

I have following set in Failover Controller Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml but it doesn't seem to help and error says it is timing out after 5000 millis.

 

<property>
<name>dfs.qjournal.write-txns.timeout.ms</name>
<value>120000</value>
</property>

<property>
<name>ipc.ping.interval</name>
<value>180000</value>
</property>

 

2014-06-16 17:25:19,053 WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at m-hdp-mnode0005/172.21.248.13:9005 standby (unable to connect)
java.net.SocketTimeoutException: Call From m-hdp-mnode0006/172.21.248.14 to m-hdp-mnode0005:9005 failed on socket timeout exception: java.net.SocketTimeoutException: 5000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.21.248.14:33112 remote=m-hdp-mnode0005/172.21.248.13:9005]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout

 

Thanks, 

-- Nick

Posts: 1,824
Kudos: 406
Solutions: 292
Registered: ‎07-31-2013

Re: HA Namenode Failover Issues

The ZKFC property for monitorHealth RPC timeouts has been changed to be more specific, and is now called ha.health-monitor.rpc-timeout.ms.
Announcements