Support Questions

Find answers, ask questions, and share your expertise

Hadoop + warnings as slow block-receive from data-node machines

avatar

We have Hadoop cluster with `487` data-nodes machines ( each data-node machine include also the Service node-manager ) , all machines are physical machines ( DELL ) , and OS is RHEL 7.9 version.

Each data-node machine have 12 disks, each disk is with size of 12T

Hadoop cluster type installed from HDP packages ( previously was under Horton-works and now under Cloudera )

Users are complain about slowness from spark applications that run on data-nodes machines

And after investigation we seen the following warning from data-node logs


2024-03-18 17:41:30,230 WARN datanode.DataNode (BlockReceiver.java:receivePacket(567)) - Slow BlockReceiver write packet to mirror took 401ms (threshold=300ms), downstream DNs=[172.87.171.24:50010, 172.87.171.23:50010]
2024-03-18 17:41:49,795 WARN datanode.DataNode (BlockReceiver.java:receivePacket(567)) - Slow BlockReceiver write packet to mirror took 410ms (threshold=300ms), downstream DNs=[172.87.171.26:50010, 172.87.171.31:50010]
2024-03-18 18:06:29,585 WARN datanode.DataNode (BlockReceiver.java:receivePacket(567)) - Slow BlockReceiver write packet to mirror took 303ms (threshold=300ms), downstream DNs=[172.87.171.34:50010, 172.87.171.22:50010]
2024-03-18 18:18:55,931 WARN datanode.DataNode (BlockReceiver.java:receivePacket(567)) - Slow BlockReceiver write packet to mirror took 729ms (threshold=300ms), downstream DNs=[172.87.11.27:50010]

from above log we can see the `warning Slow BlockReceiver write packet to mirror took xxms` and also the data-nodes machines as `172.87.171.23,172.87.171.24` etc.

from my understanding the exceptions as Slow `BlockReceiver write packet to mirror` indicate maybe on delay in writing the block to OS cache or disk

So I am trying to collect the reasons for this warning / exceptions , and here there are

1. delay in writing the block to OS cache or disk

2. cluster is as or near its resources limit ( memory , CPU or disk )

3. network issues between machines


From my verification I not see **disk** or **CPU** or **memory** problem , we checked all machines

From network side I not see special issues that relevant to machines itself

And we also used the iperf3 ro check the Bandwidth between one machine to other.

here is example between `data-node01` to `data-node03` ( from my understanding and please Correct me if I am wrong looks like Bandwidth is ok )

From data-node01

iperf3 -i 10 -s

[ ID] Interval Transfer Bandwidth
[ 5] 0.00-10.00 sec 7.90 GBytes 6.78 Gbits/sec
[ 5] 10.00-20.00 sec 8.21 GBytes 7.05 Gbits/sec
[ 5] 20.00-30.00 sec 7.25 GBytes 6.23 Gbits/sec
[ 5] 30.00-40.00 sec 7.16 GBytes 6.15 Gbits/sec
[ 5] 40.00-50.00 sec 7.08 GBytes 6.08 Gbits/sec
[ 5] 50.00-60.00 sec 6.27 GBytes 5.39 Gbits/sec
[ 5] 60.00-60.04 sec 35.4 MBytes 7.51 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 5] 0.00-60.04 sec 0.00 Bytes 0.00 bits/sec sender
[ 5] 0.00-60.04 sec 43.9 GBytes 6.28 Gbits/sec receiver


From data-node03

iperf3 -i 1 -t 60 -c 172.87.171.84

[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 792 MBytes 6.64 Gbits/sec 0 3.02 MBytes
[ 4] 1.00-2.00 sec 834 MBytes 6.99 Gbits/sec 54 2.26 MBytes
[ 4] 2.00-3.00 sec 960 MBytes 8.05 Gbits/sec 0 2.49 MBytes
[ 4] 3.00-4.00 sec 896 MBytes 7.52 Gbits/sec 0 2.62 MBytes
[ 4] 4.00-5.00 sec 790 MBytes 6.63 Gbits/sec 0 2.70 MBytes
[ 4] 5.00-6.00 sec 838 MBytes 7.03 Gbits/sec 4 1.97 MBytes
[ 4] 6.00-7.00 sec 816 MBytes 6.85 Gbits/sec 0 2.17 MBytes
[ 4] 7.00-8.00 sec 728 MBytes 6.10 Gbits/sec 0 2.37 MBytes
[ 4] 8.00-9.00 sec 692 MBytes 5.81 Gbits/sec 47 1.74 MBytes
[ 4] 9.00-10.00 sec 778 MBytes 6.52 Gbits/sec 0 1.91 MBytes
[ 4] 10.00-11.00 sec 785 MBytes 6.58 Gbits/sec 48 1.57 MBytes
[ 4] 11.00-12.00 sec 861 MBytes 7.23 Gbits/sec 0 1.84 MBytes
[ 4] 12.00-13.00 sec 844 MBytes 7.08 Gbits/sec 0 1.96 MBytes

Note - Nic card/s are with `10G` speed ( we checked this by ethtool )

We also checked the firmware-version of the NIC card

ethtool -i p1p1
driver: i40e
version: 2.8.20-k
firmware-version: 8.40 0x8000af82 20.5.13
expansion-rom-version:
bus-info: 0000:3b:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

We also checked from kernel messages ( `dmesg` ) but no seen something special.

Michael-Bronson
2 REPLIES 2

avatar
Master Collaborator

@mike_bronson7 Hello! Thanks for bringing this to our community.


Users are complain about slowness from spark applications that run on data-nodes machines

from my understanding the exceptions as Slow `BlockReceiver write packet to mirror` indicate maybe on delay in writing the block to OS cache or disk
So I am trying to collect the reasons for this warning / exceptions , and here there are

1. delay in writing the block to OS cache or disk
2. cluster is as or near its resources limit ( memory , CPU or disk )
3. network issues between machines

From network side I not see special issues that relevant to machines itself


No, This indicates that there was a delay in writing the block across the network

The WARNs may indicate that you may be facing network/hardware-level. The time taken is not excessively high, and Its likely to be a tuning issue, and as to explain this WARN,

Slow BlockReceiver write packet to mirror

 This measures the duration taken to write to the next DataNode over a regular TCP socket, and the time taken to flush the socket. An increase in this typically indicates higher network latency, as Java-wise this is a pure SocketOutputStream.write() + SocketOutputStream.flush() cost.

Also, Did you happen to monitor the DNs Network when the Spark applications are running in parallel.


Hopefully the above should help you tune the network configurations better.

V

avatar

thank you for response

but look on that also

2024-03-18 19:31:52,673 WARN datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:756ms (threshold=300ms), volume=/data/sde/hadoop/hdfs/data
2024-03-18 19:35:15,334 WARN datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:377ms (threshold=300ms), volume=/data/sdc/hadoop/hdfs/data
2024-03-18 19:51:57,774 WARN datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:375ms (threshold=300ms), volume=/data/sdb/hadoop/hdfs/data


As you can see the warning is also on local disks not only across the network

In any case we already checked the network include the switches and we not found a problem

Do you think its could be tuning issue in hdfs parameters or some parameters that can help

 

Michael-Bronson