Created 03-19-2024 07:12 AM
We have Hadoop cluster with `487` data-nodes machines ( each data-node machine include also the Service node-manager ) , all machines are physical machines ( DELL ) , and OS is RHEL 7.9 version.
Each data-node machine have 12 disks, each disk is with size of 12T
Hadoop cluster type installed from HDP packages ( previously was under Horton-works and now under Cloudera )
Users are complain about slowness from spark applications that run on data-nodes machines
And after investigation we seen the following warning from data-node logs
2024-03-18 17:41:30,230 WARN datanode.DataNode (BlockReceiver.java:receivePacket(567)) - Slow BlockReceiver write packet to mirror took 401ms (threshold=300ms), downstream DNs=[172.87.171.24:50010, 172.87.171.23:50010]
2024-03-18 17:41:49,795 WARN datanode.DataNode (BlockReceiver.java:receivePacket(567)) - Slow BlockReceiver write packet to mirror took 410ms (threshold=300ms), downstream DNs=[172.87.171.26:50010, 172.87.171.31:50010]
2024-03-18 18:06:29,585 WARN datanode.DataNode (BlockReceiver.java:receivePacket(567)) - Slow BlockReceiver write packet to mirror took 303ms (threshold=300ms), downstream DNs=[172.87.171.34:50010, 172.87.171.22:50010]
2024-03-18 18:18:55,931 WARN datanode.DataNode (BlockReceiver.java:receivePacket(567)) - Slow BlockReceiver write packet to mirror took 729ms (threshold=300ms), downstream DNs=[172.87.11.27:50010]
from above log we can see the `warning Slow BlockReceiver write packet to mirror took xxms` and also the data-nodes machines as `172.87.171.23,172.87.171.24` etc.
from my understanding the exceptions as Slow `BlockReceiver write packet to mirror` indicate maybe on delay in writing the block to OS cache or disk
So I am trying to collect the reasons for this warning / exceptions , and here there are
1. delay in writing the block to OS cache or disk
2. cluster is as or near its resources limit ( memory , CPU or disk )
3. network issues between machines
From my verification I not see **disk** or **CPU** or **memory** problem , we checked all machines
From network side I not see special issues that relevant to machines itself
And we also used the iperf3 ro check the Bandwidth between one machine to other.
here is example between `data-node01` to `data-node03` ( from my understanding and please Correct me if I am wrong looks like Bandwidth is ok )
From data-node01
iperf3 -i 10 -s
[ ID] Interval Transfer Bandwidth
[ 5] 0.00-10.00 sec 7.90 GBytes 6.78 Gbits/sec
[ 5] 10.00-20.00 sec 8.21 GBytes 7.05 Gbits/sec
[ 5] 20.00-30.00 sec 7.25 GBytes 6.23 Gbits/sec
[ 5] 30.00-40.00 sec 7.16 GBytes 6.15 Gbits/sec
[ 5] 40.00-50.00 sec 7.08 GBytes 6.08 Gbits/sec
[ 5] 50.00-60.00 sec 6.27 GBytes 5.39 Gbits/sec
[ 5] 60.00-60.04 sec 35.4 MBytes 7.51 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 5] 0.00-60.04 sec 0.00 Bytes 0.00 bits/sec sender
[ 5] 0.00-60.04 sec 43.9 GBytes 6.28 Gbits/sec receiver
From data-node03
iperf3 -i 1 -t 60 -c 172.87.171.84
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 792 MBytes 6.64 Gbits/sec 0 3.02 MBytes
[ 4] 1.00-2.00 sec 834 MBytes 6.99 Gbits/sec 54 2.26 MBytes
[ 4] 2.00-3.00 sec 960 MBytes 8.05 Gbits/sec 0 2.49 MBytes
[ 4] 3.00-4.00 sec 896 MBytes 7.52 Gbits/sec 0 2.62 MBytes
[ 4] 4.00-5.00 sec 790 MBytes 6.63 Gbits/sec 0 2.70 MBytes
[ 4] 5.00-6.00 sec 838 MBytes 7.03 Gbits/sec 4 1.97 MBytes
[ 4] 6.00-7.00 sec 816 MBytes 6.85 Gbits/sec 0 2.17 MBytes
[ 4] 7.00-8.00 sec 728 MBytes 6.10 Gbits/sec 0 2.37 MBytes
[ 4] 8.00-9.00 sec 692 MBytes 5.81 Gbits/sec 47 1.74 MBytes
[ 4] 9.00-10.00 sec 778 MBytes 6.52 Gbits/sec 0 1.91 MBytes
[ 4] 10.00-11.00 sec 785 MBytes 6.58 Gbits/sec 48 1.57 MBytes
[ 4] 11.00-12.00 sec 861 MBytes 7.23 Gbits/sec 0 1.84 MBytes
[ 4] 12.00-13.00 sec 844 MBytes 7.08 Gbits/sec 0 1.96 MBytes
Note - Nic card/s are with `10G` speed ( we checked this by ethtool )
We also checked the firmware-version of the NIC card
ethtool -i p1p1
driver: i40e
version: 2.8.20-k
firmware-version: 8.40 0x8000af82 20.5.13
expansion-rom-version:
bus-info: 0000:3b:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
We also checked from kernel messages ( `dmesg` ) but no seen something special.
Created 03-20-2024 02:20 PM
@mike_bronson7 Hello! Thanks for bringing this to our community.
Users are complain about slowness from spark applications that run on data-nodes machines
from my understanding the exceptions as Slow `BlockReceiver write packet to mirror` indicate maybe on delay in writing the block to OS cache or disk
So I am trying to collect the reasons for this warning / exceptions , and here there are1. delay in writing the block to OS cache or disk
2. cluster is as or near its resources limit ( memory , CPU or disk )
3. network issues between machinesFrom network side I not see special issues that relevant to machines itself
No, This indicates that there was a delay in writing the block across the network
The WARNs may indicate that you may be facing network/hardware-level. The time taken is not excessively high, and Its likely to be a tuning issue, and as to explain this WARN,
Slow BlockReceiver write packet to mirror
This measures the duration taken to write to the next DataNode over a regular TCP socket, and the time taken to flush the socket. An increase in this typically indicates higher network latency, as Java-wise this is a pure SocketOutputStream.write() + SocketOutputStream.flush() cost.
Also, Did you happen to monitor the DNs Network when the Spark applications are running in parallel.
Hopefully the above should help you tune the network configurations better.
V
Created on 03-20-2024 11:46 PM - edited 03-20-2024 11:50 PM
thank you for response
but look on that also
2024-03-18 19:31:52,673 WARN datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:756ms (threshold=300ms), volume=/data/sde/hadoop/hdfs/data
2024-03-18 19:35:15,334 WARN datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:377ms (threshold=300ms), volume=/data/sdc/hadoop/hdfs/data
2024-03-18 19:51:57,774 WARN datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:375ms (threshold=300ms), volume=/data/sdb/hadoop/hdfs/data
As you can see the warning is also on local disks not only across the network
In any case we already checked the network include the switches and we not found a problem
Do you think its could be tuning issue in hdfs parameters or some parameters that can help