- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Datanode socket timeout setting
- Labels:
-
Apache Hadoop
-
HDFS
Created on ‎05-10-2016 02:41 PM - edited ‎09-16-2022 03:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi. I'm new to hadoop and Cloudera. I have a 5 node cluster with 3 data nodes. I have a third party client program that is opening hdfs files and sending them data as it arrive in a stream. On a timer, every 10 min, the client closes the files and opens new ones for writing. Before the close can happen, the datanode socket connection times out with this error:
2016-05-10 14:17:20,165 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-1298278955-172.31.1.79-1461125109305:blk_1073807048_66356
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.31.15.196:50010 remote=/172.31.1.81:57017]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:500)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:894)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:794)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
Question: How do I change the 60000 milis setting to a larger value?
I've tried dfs.datanode.socket.write.timeout and dfs.socket.timeout in hdfs config through Cloudera admin with config redeploy and cluster restart. I've also tried adding these and dfs.client.socket-timeout in hdfs-client.xml on the client side. Nothing seems to affect the used value.
Thanks in advance.
-Bruce
Created ‎07-25-2017 11:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hai,
Did you get any resolution for this ? Even am facing the same problem now.
Thanks
Created ‎11-08-2018 02:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
pls paster your solution i m facing the same issue
Created ‎11-08-2018 02:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am facing the same issue
Created ‎11-22-2018 08:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We cannot be sure of the reasons for this message with the snippet that you have provided. If you notice, the connection is being successfuly set but there is not response from DN.
~~~
java.nio.channels.SocketChannel[connected local=/172.31.15.196:50010 remote=/172.31.1.81:57017]
~~~
It can happen due to various reasons, like, the pipeline is interrupted, there are network congestions at play, the DN disk is not performing well, DN host OS is having issues like kernel soft lockups or just that the DN is too heavily loaded to respond back. You'd have to dig in more into the logs and look for more information.
See the messages logged before the exception you're getting in the DN logs.
