Support Questions
Find answers, ask questions, and share your expertise

Datanode Starts Running but Does Not Go Live.

Highlighted

Datanode Starts Running but Does Not Go Live.

Explorer

I have a 6 node Centos 7 cluster with 4 datanodes. I have all the datanodes up and running but the dashboard shows only 3/4 datanodes live. I looked at the logs at /var/log/hadoop/hdfs/hadoop-hdfs-datanode-<data_node>.log and it says:

2017-09-14 11:57:04,794 INFO  web.DatanodeHttpServer (SimpleHttpProxyHandler.java:exceptionCaught(147)) - Proxy for / failed. cause:
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447)
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
        at java.lang.Thread.run(Thread.java:745)

Not sure what this means.

I tried restarting ambari-agent, rebooting the machine itself, restarting ambari-server on namenode. Can someone suggest where else I should look?

EDIT: Also, I tried pinging the name node from this particular datanode on that particular port it is listening to (8020- standard port for Hadoop) and it connects. I can see the connection from both, datanode and namenode. I don't understand why the communication is not happening.

3 REPLIES 3
Highlighted

Re: Datanode Starts Running but Does Not Go Live.

Contributor

@Sree Kupp The above just means the client disconnect after finish writing. The issue has been fixed in latest HDP 2.5. Can you check Namenode UI live nodes ? are they 4 ?

Highlighted

Re: Datanode Starts Running but Does Not Go Live.

Explorer

@Vinod Thanks for your reply. Yes, it shows 4 live data nodes. I am surprised how this happened. I have been working on it the whole day yesterday and nothing really happened. Today morning I just restarted the data node process for the failed data node and I can all data nodes are live today. Can you explain this please?

Re: Datanode Starts Running but Does Not Go Live.

Contributor

@Sree Kupp hard to explain without datanode and namenode logs. Check the datanode logs.