Created on 03-30-2023 04:49 AM - edited 03-30-2023 04:50 AM
Hi, after restarting the pod for hdfs-datanode and hdfs-namenode, hdfs-datanode 0,1,2 are not connecting to namenode.
2023-03-30 10:45:18,285 DEBUG ipc.Client: Connecting to hdfs-namenode-0.hdfs-namenode/10.128.66.125:9820 2023-03-30 10:45:19,287 INFO ipc.Client: Retrying connect to server: hdfs-namenode-0.hdfs-namenode/10.128.66.125:9820. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries =10, sleepTime=1000 MILLISECONDS) 2023-03-30 10:45:20,289 INFO ipc.Client: Retrying connect to server: hdfs-namenode-0.hdfs-namenode/10.128.66.125:9820. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries =10, sleepTime=1000 MILLISECONDS) 2023-03-30 10:45:21,291 INFO ipc.Client: Retrying connect to server: hdfs-namenode-0.hdfs-namenode/10.128.66.125:9820. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries =10, sleepTime=1000 MILLISECONDS) 2023-03-30 10:45:22,293 INFO ipc.Client: Retrying connect to server: hdfs-namenode-0.hdfs-namenode/10.128.66.125:9820. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries =10, sleepTime=1000 MILLISECONDS) 2023-03-30 10:45:23,295 INFO ipc.Client: Retrying connect to server: hdfs-namenode-0.hdfs-namenode/10.128.66.125:9820. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2023-03-30 10:45:24,299 INFO ipc.Client: Retrying connect to server: hdfs-namenode-0.hdfs-namenode/10.128.66.125:9820. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2023-03-30 10:45:25,301 INFO ipc.Client: Retrying connect to server: hdfs-namenode-0.hdfs-namenode/10.128.66.125:9820. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2023-03-30 10:45:26,305 INFO ipc.Client: Retrying connect to server: hdfs-namenode-0.hdfs-namenode/10.128.66.125:9820. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2023-03-30 10:45:27,307 INFO ipc.Client: Retrying connect to server: hdfs-namenode-0.hdfs-namenode/10.128.66.125:9820. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2023-03-30 10:45:27,546 DEBUG ipc.Server: IPC Server idle connection scanner for port 9867: task running 2023-03-30 10:45:28,318 INFO ipc.Client: Retrying connect to server: hdfs-namenode-0.hdfs-namenode/10.128.66.125:9820. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2023-03-30 10:45:28,320 DEBUG ipc.Client: Failed to connect to server: hdfs-namenode-0.hdfs-namenode/10.128.66.125:9820: retries get failed due to exceeded maximum allowed retries number: 10 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:687) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:790) at org.apache.hadoop.ipc.Client$Connection.access$3600(Client.java:410) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1558) at org.apache.hadoop.ipc.Client.call(Client.java:1389) at org.apache.hadoop.ipc.Client.call(Client.java:1353) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy21.versionRequest(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.versionRequest(DatanodeProtocolClientSideTranslatorPB.java:287) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:229) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:275) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816) at java.lang.Thread.run(Thread.java:748) 2023-03-30 10:45:28,321 DEBUG ipc.Client: closing ipc connection to hdfs-namenode-0.hdfs-namenode/10.128.66.125:9820: Connection refused java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
here are the logs of hdfs-datanode.
Thank you
Created 03-30-2023 09:50 AM
@Noel_0317 Welcome to our community! To help you get the best possible answer, I have tagged in our HDFS experts @rki_ @mszurap @willx @Asok who may be able to assist you further.
Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Created 03-31-2023 11:34 AM
Can you isolate any connection issues between your NN and DN pods? Maybe you can try doing an nc or telnet to the NN port from the DN pod?