05-10-2016 02:41 PM
Hi. I'm new to hadoop and Cloudera. I have a 5 node cluster with 3 data nodes. I have a third party client program that is opening hdfs files and sending them data as it arrive in a stream. On a timer, every 10 min, the client closes the files and opens new ones for writing. Before the close can happen, the datanode socket connection times out with this error:
2016-05-10 14:17:20,165 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-1298278955-172.31.1.79-1461125109305:blk_1073807048_66356
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.31.15.196:50010 remote=/172.31.1.81:57017]
Question: How do I change the 60000 milis setting to a larger value?
I've tried dfs.datanode.socket.write.timeout and dfs.socket.timeout in hdfs config through Cloudera admin with config redeploy and cluster restart. I've also tried adding these and dfs.client.socket-timeout in hdfs-client.xml on the client side. Nothing seems to affect the used value.
Thanks in advance.
11-22-2018 08:51 PM
We cannot be sure of the reasons for this message with the snippet that you have provided. If you notice, the connection is being successfuly set but there is not response from DN.
java.nio.channels.SocketChannel[connected local=/172.31.15.196:50010 remote=/172.31.1.81:57017]
It can happen due to various reasons, like, the pipeline is interrupted, there are network congestions at play, the DN disk is not performing well, DN host OS is having issues like kernel soft lockups or just that the DN is too heavily loaded to respond back. You'd have to dig in more into the logs and look for more information.
See the messages logged before the exception you're getting in the DN logs.