Member since
02-19-2016
9
Posts
8
Kudos Received
0
Solutions
04-28-2016
10:45 AM
I ran a terasort and it wouldn't complete, so we tried doing a large put and found this error on our DFS Client.
16/04/28 16:25:27 WARN hdfs.DFSClient: Slow ReadProcessor read fields took 52148ms (threshold=30000ms); ack: seqno: 25357 reply: SUCCESS reply: SUCCESS reply: SUCCESS downstreamAckTimeNanos: 61584486863 flag: 0 flag: 0 flag: 0, targets: [DatanodeInfoWithStorage[10.50.45.148:50010,DS-d4a0215d-8171-4a8b-a3a1-a6a7748b3f23,DISK], DatanodeInfoWithStorage[10.50.45.138:50010,DS-294fded8-1dcd-465e-89d6-c3d6fc9fb61f,DISK], DatanodeInfoWithStorage[10.50.45.143:50010,DS-987423d0-15f5-454c-9034-37a65933e743,DISK]]
16/04/28 16:28:52 WARN hdfs.DFSClient: Slow ReadProcessor read fields took 60247ms (threshold=30000ms); ack: seqno: -2 reply: SUCCESS reply: ERROR downstreamAckTimeNanos: 0 flag: 0 flag: 1, targets: [DatanodeInfoWithStorage[10.50.45.148:50010,DS-d4a0215d-8171-4a8b-a3a1-a6a7748b3f23,DISK], DatanodeInfoWithStorage[10.50.45.138:50010,DS-294fded8-1dcd-465e-89d6-c3d6fc9fb61f,DISK], DatanodeInfoWithStorage[10.50.45.143:50010,DS-987423d0-15f5-454c-9034-37a65933e743,DISK]]
16/04/28 16:28:52 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1466039745-10.50.45.131-1461703637937:blk_1073771085_31037
java.io.IOException: Bad response ERROR for block BP-1466039745-10.50.45.131-1461703637937:blk_1073771085_31037 from datanode DatanodeInfoWithStorage[10.50.45.138:50010,DS-294fded8-1dcd-465e-89d6-c3d6fc9fb61f,DISK]
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer
Along with a bunch of Broken Pipe Errors. Upon closer investigation, we're saw that the datanodes are experiencing a lot of these messages: 2016-04-28 01:55:48,546 WARN datanode.DataNode (BlockReceiver.java:receivePacket(562)) - Slow BlockReceiver write packet to mirror took 4284ms (threshold=300ms)
2016-04-28 01:55:48,954 WARN datanode.DataNode (BlockReceiver.java:receivePacket(562)) - Slow BlockReceiver write packet to mirror took 406ms (threshold=300ms)
2016-04-28 01:55:51,826 WARN datanode.DataNode (BlockReceiver.java:receivePacket(562)) - Slow BlockReceiver write packet to mirror took 2872ms (threshold=300ms)
2016-04-28 01:55:52,384 WARN datanode.DataNode (BlockReceiver.java:receivePacket(562)) - Slow BlockReceiver write packet to mirror took 557ms (threshold=300ms)
2016-04-28 01:55:54,870 WARN datanode.DataNode (BlockReceiver.java:receivePacket(562)) - Slow BlockReceiver write packet to mirror took 2486ms (threshold=300ms)
2016-04-28 01:55:59,770 WARN datanode.DataNode (BlockReceiver.java:receivePacket(562)) - Slow BlockReceiver write packet to mirror took 4900ms (threshold=300ms)
2016-04-28 01:56:01,402 WARN datanode.DataNode (BlockReceiver.java:receivePacket(562)) - Slow BlockReceiver write packet to mirror took 1631ms (threshold=300ms)
2016-04-28 01:56:03,451 WARN datanode.DataNode (BlockReceiver.java:receivePacket(562)) - Slow BlockReceiver write packet to mirror took 2048ms (threshold=300ms)
2016-04-28 01:56:04,550 WARN datanode.DataNode (BlockReceiver.java:receivePacket(562)) - Slow BlockReceiver write packet to mirror took 979ms (threshold=300ms)
2016-04-28 01:56:12,072 WARN datanode.DataNode (BlockReceiver.java:receivePacket(562)) - Slow BlockReceiver write packet to mirror took 7521ms (threshold=300ms)
It looks like a packet problem, upon checking ifconfig, we've dropped only 30 out of 3,000,000 packets. What else could we check on to pin down this issue? Off-topic:
Upon investigation we saw this in the *.out file of the datanode
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 1547551
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
But we're not experiencing the "Too many open files error" despite having 1k open file limit.
PS: Thanks to everyone in this community, you guys are the best.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN