Created 09-14-2017 04:15 PM
I have a 6 node Centos 7 cluster with 4 datanodes. I have all the datanodes up and running but the dashboard shows only 3/4 datanodes live. I looked at the logs at /var/log/hadoop/hdfs/hadoop-hdfs-datanode-<data_node>.log and it says:
2017-09-14 11:57:04,794 INFO web.DatanodeHttpServer (SimpleHttpProxyHandler.java:exceptionCaught(147)) - Proxy for / failed. cause: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745)
Not sure what this means.
I tried restarting ambari-agent, rebooting the machine itself, restarting ambari-server on namenode. Can someone suggest where else I should look?
EDIT: Also, I tried pinging the name node from this particular datanode on that particular port it is listening to (8020- standard port for Hadoop) and it connects. I can see the connection from both, datanode and namenode. I don't understand why the communication is not happening.
Created 09-14-2017 11:01 PM
@Sree Kupp The above just means the client disconnect after finish writing. The issue has been fixed in latest HDP 2.5. Can you check Namenode UI live nodes ? are they 4 ?
Created 09-15-2017 02:16 PM
@Vinod Thanks for your reply. Yes, it shows 4 live data nodes. I am surprised how this happened. I have been working on it the whole day yesterday and nothing really happened. Today morning I just restarted the data node process for the failed data node and I can all data nodes are live today. Can you explain this please?
Created 09-15-2017 06:28 PM
@Sree Kupp hard to explain without datanode and namenode logs. Check the datanode logs.