Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

FailoverController Connect namenode timed out

Highlighted

FailoverController Connect namenode timed out

New Contributor

 

        Hi,All:

 

          I used the CDH5.5.2 version, installed the hadoop ha service, and Recently, an auto failover occurred one or more times during one day,and I found nothing useful in the logs of namenode.Only ZKFC shows the error message of a link timeout, But the port state is normal for namenode .and then the monitoring status starts to be unhealthy for ZKFC , and the cluster automatically switches after the timeout period.

 

os:CentOS Linux release 7.2.1511 
hadoop version  :2.6.0-cdh5.5.2
java version : 1.7.0_80

nn1  ip:172.18.0.1 
nn2 ip :  172.18.0.2


Here are some zkfc error messages:

 

2018-02-27 11:32:59,781 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 27: Response <- bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022: getServiceStatus {state: ACTIVE readyToBecomeActive: true}
2018-02-27 11:33:53,050 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 27: Response <- bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022: getServiceStatus {state: STANDBY readyToBecomeActive: true}
2018-02-27 11:33:00,782 DEBUG org.apache.hadoop.ipc.Client: IPC Client (1775252113) connection to bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022 from hdfs sending #378814

2018-02-27 11:33:19,932 DEBUG org.apache.hadoop.ipc.Client: closing ipc connection to bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022: Connection timed outjava.io.IOException: Connection timed out
java.io.IOException: Connection timed out
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)
        at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
        at java.io.FilterInputStream.read(FilterInputStream.java:133)
        at java.io.FilterInputStream.read(FilterInputStream.java:133)
        at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:526)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1088)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:983)
2018-02-27 11:33:19,934 DEBUG org.apache.hadoop.ipc.Client: IPC Client (1775252113) connection to bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022 from hdfs: closed
2018-02-27 11:33:19,934 DEBUG org.apache.hadoop.ipc.Client: IPC Client (1775252113) connection to bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022 from hdfs: stopped, remaining connections 0
2018-02-27 11:33:19,935 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 27: Exception <- bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022: getServiceStatus {java.io.IOException: Failed on local exception: java.io.IOException: Connection timed out; Host Details : local host is: "bj-dc-namenode-001.tendcloud.com/172.18.0.1"; destination host is: "bj-dc-namenode-001.tendcloud.com":8022; }
2018-02-27 11:33:19,936 WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022: Failed on local exception: java.io.IOException: Connection timed out; Host Details : local host is: "bj-dc-namenode-001.tendcloud.com/172.18.0.1"; destination host is: "bj-dc-namenode-001.tendcloud.com":8022;
2018-02-27 11:33:19,936 DEBUG org.apache.hadoop.ipc.Client: stopping client from cache: org.apache.hadoop.ipc.Client@1d838b46
2018-02-27 11:33:19,936 INFO org.apache.hadoop.ha.HealthMonitor: Entering state SERVICE_NOT_RESPONDING
2018-02-27 11:33:19,936 INFO org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022 entered state: SERVICE_NOT_RESPONDING

201802271133.png

 

*********************
2018-02-27 06:31:01,702 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 27: Response <- bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022: getServiceStatus {state: ACTIVE readyToBecomeActive: true}
2018-02-27 06:31:45,570 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 27: Response <- bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022: getServiceStatus {state: STANDBY readyToBecomeActive: true}

2018-02-27 06:31:02,703 DEBUG org.apache.hadoop.ipc.Client: IPC Client (1775252113) connection to bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022 from hdfs sending #343976

 

201802270631.png