Created 02-23-2016 08:39 PM
We are using HDP2.0. Recently we cannot write any new table to it. All components look healthy from the ambari webui. In the masternode hdfs logs we found the following error messages:
2016-02-23 17:25:09,985 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(698)) - Exception for BP-1706820793-10.86.36.8-1381941559687:blk_1080366074_6646021 java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.86.36.8:50010 remote=/10.80.27.210:54210] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:429) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:668) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:102) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:662) 2016-02-23 17:25:09,985 ERROR datanode.DataNode (DataXceiver.java:run(225)) - dn01.nor1solutions.com:50010:DataXceiver error processing WRITE_BLOCK operation src: /10.80.27.210:54210 dest: /10.86.36.8:50010 java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.86.36.8:50010 remote=/10.80.27.210:54210] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:429) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:668) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:102) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:662) Can anyone help fixing it? Thanks!
Created 02-23-2016 11:51 PM
Created 02-23-2016 11:51 PM
Created 02-24-2016 08:21 PM
Thanks @Neeraj Sabharwal! I've checked all the nodes in RM web UI and all are healthy. I tried to restart the whole cluster but the same problem happened again. Did not see anything in the Resource Manager logs. Should I change any configuration as shown in this thread?
Created 02-25-2016 02:12 AM
Can you check the setting for the following parameters?
In my case it's
dfs.datanode.max.transfer.threads = 4096
dfs.datanode.handler.count =10
dfs.client.file-block-storage-locations.num-threads = 10
Created 02-25-2016 05:27 PM
Thanks @Neeraj Sabharwal ♦
dfs.datanode.max.transfer.threads = 1024
dfs.datanode.handler.count =100
Did not set the property dfs.client.file-block-storage-locations.num-threads.
dfs.blocksize = 134217728
Block replication = 3
Reserved space for HDFS = 1GB
io.file.buffer.size = 131072
Thanks!
Created 02-25-2016 05:50 PM
@Jade Liu Can you setup the following values for those properties?
dfs.datanode.max.transfer.threads = 4096
dfs.datanode.handler.count =10
dfs.client.file-block-storage-locations.num-threads = 10 --> you can add this
Created 02-25-2016 06:17 PM
problem fixed. Turns out we have a sqoop job which keeps writing to the cluster and once we killed it, it was fixed. Thanks @Neeraj Sabharwal ♦!