Reply
Highlighted
New Contributor
Posts: 2
Registered: ‎02-28-2018

FailoverController Connect namenode timed out

[ Edited ]

 

        Hi,All:

 

          I used the CDH5.5.2 version, installed the hadoop ha service, and Recently, an auto failover occurred one or more times during one day,and I found nothing useful in the logs of namenode.Only ZKFC shows the error message of a link timeout, But the port state is normal for namenode .and then the monitoring status starts to be unhealthy for ZKFC , and the cluster automatically switches after the timeout period.

 

os:CentOS Linux release 7.2.1511 
hadoop version  :2.6.0-cdh5.5.2
java version : 1.7.0_80

nn1  ip:172.18.0.1 
nn2 ip :  172.18.0.2


Here are some zkfc error messages:

 

2018-02-27 11:32:59,781 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 27: Response <- bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022: getServiceStatus {state: ACTIVE readyToBecomeActive: true}
2018-02-27 11:33:53,050 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 27: Response <- bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022: getServiceStatus {state: STANDBY readyToBecomeActive: true}
2018-02-27 11:33:00,782 DEBUG org.apache.hadoop.ipc.Client: IPC Client (1775252113) connection to bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022 from hdfs sending #378814

2018-02-27 11:33:19,932 DEBUG org.apache.hadoop.ipc.Client: closing ipc connection to bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022: Connection timed outjava.io.IOException: Connection timed out
java.io.IOException: Connection timed out
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)
        at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
        at java.io.FilterInputStream.read(FilterInputStream.java:133)
        at java.io.FilterInputStream.read(FilterInputStream.java:133)
        at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:526)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1088)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:983)
2018-02-27 11:33:19,934 DEBUG org.apache.hadoop.ipc.Client: IPC Client (1775252113) connection to bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022 from hdfs: closed
2018-02-27 11:33:19,934 DEBUG org.apache.hadoop.ipc.Client: IPC Client (1775252113) connection to bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022 from hdfs: stopped, remaining connections 0
2018-02-27 11:33:19,935 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 27: Exception <- bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022: getServiceStatus {java.io.IOException: Failed on local exception: java.io.IOException: Connection timed out; Host Details : local host is: "bj-dc-namenode-001.tendcloud.com/172.18.0.1"; destination host is: "bj-dc-namenode-001.tendcloud.com":8022; }
2018-02-27 11:33:19,936 WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022: Failed on local exception: java.io.IOException: Connection timed out; Host Details : local host is: "bj-dc-namenode-001.tendcloud.com/172.18.0.1"; destination host is: "bj-dc-namenode-001.tendcloud.com":8022;
2018-02-27 11:33:19,936 DEBUG org.apache.hadoop.ipc.Client: stopping client from cache: org.apache.hadoop.ipc.Client@1d838b46
2018-02-27 11:33:19,936 INFO org.apache.hadoop.ha.HealthMonitor: Entering state SERVICE_NOT_RESPONDING
2018-02-27 11:33:19,936 INFO org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022 entered state: SERVICE_NOT_RESPONDING

201802271133.png

 

*********************
2018-02-27 06:31:01,702 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 27: Response <- bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022: getServiceStatus {state: ACTIVE readyToBecomeActive: true}
2018-02-27 06:31:45,570 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 27: Response <- bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022: getServiceStatus {state: STANDBY readyToBecomeActive: true}

2018-02-27 06:31:02,703 DEBUG org.apache.hadoop.ipc.Client: IPC Client (1775252113) connection to bj-dc-namenode-001.tendcloud.com/172.18.0.1:8022 from hdfs sending #343976

 

201802270631.png

 

 

Announcements