<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hadoop + warnings as slow block-receive from data-node machines in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Hadoop-warnings-as-slow-block-receive-from-data-node/m-p/385291#M245675</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/59349"&gt;@mike_bronson7&lt;/a&gt;&amp;nbsp;Hello! Thanks for bringing this to our community.&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;P&gt;Users are complain about slowness from spark applications that run on data-nodes machines&lt;/P&gt;&lt;P&gt;from my understanding the exceptions as Slow `BlockReceiver write packet to mirror` indicate maybe on delay in writing the block to OS cache or disk&lt;BR /&gt;So I am trying to collect the reasons for this warning / exceptions , and here there are&lt;/P&gt;&lt;P&gt;1. delay in writing the block to OS cache or disk&lt;BR /&gt;2. cluster is as or near its resources limit ( memory , CPU or disk )&lt;BR /&gt;3. network issues between machines&lt;/P&gt;&lt;P&gt;From network side I not see special issues that relevant to machines itself&lt;/P&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;SPAN&gt;No, This indicates that there was a delay in writing the block across the network&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The WARNs may indicate that you may be facing network/hardware-level. The time taken is not excessively high, and Its likely to be a tuning issue, and as to explain this WARN,&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;Slow BlockReceiver write packet to mirror&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;SPAN&gt;This measures the duration taken to write to the next DataNode over a regular TCP socket, and the time taken to flush the socket. An increase in this typically indicates higher network latency, as Java-wise this is a pure SocketOutputStream.write() + SocketOutputStream.flush() cost.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Also, Did you happen to monitor the DNs Network when the Spark applications are running in parallel.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;SPAN&gt;Hopefully the above should help you tune the network configurations better.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;V&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 20 Mar 2024 21:20:43 GMT</pubDate>
    <dc:creator>vaishaakb</dc:creator>
    <dc:date>2024-03-20T21:20:43Z</dc:date>
    <item>
      <title>Hadoop + warnings as slow block-receive from data-node machines</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hadoop-warnings-as-slow-block-receive-from-data-node/m-p/385182#M245639</link>
      <description>&lt;P&gt;We have Hadoop cluster with `487` data-nodes machines ( each data-node machine include also the Service node-manager ) , all machines are physical machines ( DELL ) , and OS is RHEL 7.9 version.&lt;/P&gt;&lt;P&gt;Each data-node machine have 12 disks, each disk is with size of 12T&lt;/P&gt;&lt;P&gt;Hadoop cluster type installed from HDP packages ( previously was under Horton-works and now under Cloudera )&lt;/P&gt;&lt;P&gt;Users are complain about slowness from spark applications that run on data-nodes machines&lt;/P&gt;&lt;P&gt;And after investigation we seen the following warning from data-node logs&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;2024-03-18 17:41:30,230 WARN datanode.DataNode (BlockReceiver.java:receivePacket(567)) - Slow BlockReceiver write packet to mirror took 401ms (threshold=300ms), downstream DNs=[172.87.171.24:50010, 172.87.171.23:50010]&lt;BR /&gt;2024-03-18 17:41:49,795 WARN datanode.DataNode (BlockReceiver.java:receivePacket(567)) - Slow BlockReceiver write packet to mirror took 410ms (threshold=300ms), downstream DNs=[172.87.171.26:50010, 172.87.171.31:50010]&lt;BR /&gt;2024-03-18 18:06:29,585 WARN datanode.DataNode (BlockReceiver.java:receivePacket(567)) - Slow BlockReceiver write packet to mirror took 303ms (threshold=300ms), downstream DNs=[172.87.171.34:50010, 172.87.171.22:50010]&lt;BR /&gt;2024-03-18 18:18:55,931 WARN datanode.DataNode (BlockReceiver.java:receivePacket(567)) - Slow BlockReceiver write packet to mirror took 729ms (threshold=300ms), downstream DNs=[172.87.11.27:50010]&lt;/P&gt;&lt;P&gt;from above log we can see the `warning Slow BlockReceiver write packet to mirror took xxms` and also the data-nodes machines as `172.87.171.23,172.87.171.24` etc.&lt;/P&gt;&lt;P&gt;from my understanding the exceptions as Slow `BlockReceiver write packet to mirror` indicate maybe on delay in writing the block to OS cache or disk&lt;/P&gt;&lt;P&gt;So I am trying to collect the reasons for this warning / exceptions , and here there are&lt;/P&gt;&lt;P&gt;1. delay in writing the block to OS cache or disk&lt;/P&gt;&lt;P&gt;2. cluster is as or near its resources limit ( memory , CPU or disk )&lt;/P&gt;&lt;P&gt;3. network issues between machines&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;From my verification I not see **disk** or **CPU** or **memory** problem , we checked all machines&lt;/P&gt;&lt;P&gt;From network side I not see special issues that relevant to machines itself&lt;/P&gt;&lt;P&gt;And we also used the iperf3 ro check the Bandwidth between one machine to other.&lt;/P&gt;&lt;P&gt;here is example between `data-node01` to `data-node03` ( from my understanding and please Correct me if I am wrong looks like Bandwidth is ok )&lt;/P&gt;&lt;P&gt;From data-node01&lt;/P&gt;&lt;P&gt;iperf3 -i 10 -s&lt;/P&gt;&lt;P&gt;[ ID] Interval Transfer Bandwidth&lt;BR /&gt;[ 5] 0.00-10.00 sec 7.90 GBytes 6.78 Gbits/sec&lt;BR /&gt;[ 5] 10.00-20.00 sec 8.21 GBytes 7.05 Gbits/sec&lt;BR /&gt;[ 5] 20.00-30.00 sec 7.25 GBytes 6.23 Gbits/sec&lt;BR /&gt;[ 5] 30.00-40.00 sec 7.16 GBytes 6.15 Gbits/sec&lt;BR /&gt;[ 5] 40.00-50.00 sec 7.08 GBytes 6.08 Gbits/sec&lt;BR /&gt;[ 5] 50.00-60.00 sec 6.27 GBytes 5.39 Gbits/sec&lt;BR /&gt;[ 5] 60.00-60.04 sec 35.4 MBytes 7.51 Gbits/sec&lt;BR /&gt;- - - - - - - - - - - - - - - - - - - - - - - - -&lt;BR /&gt;[ ID] Interval Transfer Bandwidth&lt;BR /&gt;[ 5] 0.00-60.04 sec 0.00 Bytes 0.00 bits/sec sender&lt;BR /&gt;[ 5] 0.00-60.04 sec 43.9 GBytes 6.28 Gbits/sec receiver&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;From data-node03&lt;/P&gt;&lt;P&gt;iperf3 -i 1 -t 60 -c 172.87.171.84&lt;/P&gt;&lt;P&gt;[ ID] Interval Transfer Bandwidth Retr Cwnd&lt;BR /&gt;[ 4] 0.00-1.00 sec 792 MBytes 6.64 Gbits/sec 0 3.02 MBytes&lt;BR /&gt;[ 4] 1.00-2.00 sec 834 MBytes 6.99 Gbits/sec 54 2.26 MBytes&lt;BR /&gt;[ 4] 2.00-3.00 sec 960 MBytes 8.05 Gbits/sec 0 2.49 MBytes&lt;BR /&gt;[ 4] 3.00-4.00 sec 896 MBytes 7.52 Gbits/sec 0 2.62 MBytes&lt;BR /&gt;[ 4] 4.00-5.00 sec 790 MBytes 6.63 Gbits/sec 0 2.70 MBytes&lt;BR /&gt;[ 4] 5.00-6.00 sec 838 MBytes 7.03 Gbits/sec 4 1.97 MBytes&lt;BR /&gt;[ 4] 6.00-7.00 sec 816 MBytes 6.85 Gbits/sec 0 2.17 MBytes&lt;BR /&gt;[ 4] 7.00-8.00 sec 728 MBytes 6.10 Gbits/sec 0 2.37 MBytes&lt;BR /&gt;[ 4] 8.00-9.00 sec 692 MBytes 5.81 Gbits/sec 47 1.74 MBytes&lt;BR /&gt;[ 4] 9.00-10.00 sec 778 MBytes 6.52 Gbits/sec 0 1.91 MBytes&lt;BR /&gt;[ 4] 10.00-11.00 sec 785 MBytes 6.58 Gbits/sec 48 1.57 MBytes&lt;BR /&gt;[ 4] 11.00-12.00 sec 861 MBytes 7.23 Gbits/sec 0 1.84 MBytes&lt;BR /&gt;[ 4] 12.00-13.00 sec 844 MBytes 7.08 Gbits/sec 0 1.96 MBytes&lt;/P&gt;&lt;P&gt;Note - Nic card/s are with `10G` speed ( we checked this by ethtool )&lt;/P&gt;&lt;P&gt;We also checked the firmware-version of the NIC card&lt;/P&gt;&lt;P&gt;ethtool -i p1p1&lt;BR /&gt;driver: i40e&lt;BR /&gt;version: 2.8.20-k&lt;BR /&gt;firmware-version: 8.40 0x8000af82 20.5.13&lt;BR /&gt;expansion-rom-version:&lt;BR /&gt;bus-info: 0000:3b:00.0&lt;BR /&gt;supports-statistics: yes&lt;BR /&gt;supports-test: yes&lt;BR /&gt;supports-eeprom-access: yes&lt;BR /&gt;supports-register-dump: yes&lt;BR /&gt;supports-priv-flags: yes&lt;/P&gt;&lt;P&gt;We also checked from kernel messages ( `dmesg` ) but no seen something special.&lt;/P&gt;</description>
      <pubDate>Tue, 19 Mar 2024 14:12:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hadoop-warnings-as-slow-block-receive-from-data-node/m-p/385182#M245639</guid>
      <dc:creator>mike_bronson7</dc:creator>
      <dc:date>2024-03-19T14:12:54Z</dc:date>
    </item>
    <item>
      <title>Re: Hadoop + warnings as slow block-receive from data-node machines</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hadoop-warnings-as-slow-block-receive-from-data-node/m-p/385291#M245675</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/59349"&gt;@mike_bronson7&lt;/a&gt;&amp;nbsp;Hello! Thanks for bringing this to our community.&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;P&gt;Users are complain about slowness from spark applications that run on data-nodes machines&lt;/P&gt;&lt;P&gt;from my understanding the exceptions as Slow `BlockReceiver write packet to mirror` indicate maybe on delay in writing the block to OS cache or disk&lt;BR /&gt;So I am trying to collect the reasons for this warning / exceptions , and here there are&lt;/P&gt;&lt;P&gt;1. delay in writing the block to OS cache or disk&lt;BR /&gt;2. cluster is as or near its resources limit ( memory , CPU or disk )&lt;BR /&gt;3. network issues between machines&lt;/P&gt;&lt;P&gt;From network side I not see special issues that relevant to machines itself&lt;/P&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;SPAN&gt;No, This indicates that there was a delay in writing the block across the network&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The WARNs may indicate that you may be facing network/hardware-level. The time taken is not excessively high, and Its likely to be a tuning issue, and as to explain this WARN,&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;Slow BlockReceiver write packet to mirror&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;SPAN&gt;This measures the duration taken to write to the next DataNode over a regular TCP socket, and the time taken to flush the socket. An increase in this typically indicates higher network latency, as Java-wise this is a pure SocketOutputStream.write() + SocketOutputStream.flush() cost.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Also, Did you happen to monitor the DNs Network when the Spark applications are running in parallel.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;SPAN&gt;Hopefully the above should help you tune the network configurations better.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;V&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Mar 2024 21:20:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hadoop-warnings-as-slow-block-receive-from-data-node/m-p/385291#M245675</guid>
      <dc:creator>vaishaakb</dc:creator>
      <dc:date>2024-03-20T21:20:43Z</dc:date>
    </item>
    <item>
      <title>Re: Hadoop + warnings as slow block-receive from data-node machines</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hadoop-warnings-as-slow-block-receive-from-data-node/m-p/385299#M245679</link>
      <description>&lt;P&gt;thank you for response&lt;BR /&gt;&lt;BR /&gt;but look on that also&lt;BR /&gt;&lt;BR /&gt;2024-03-18 19:31:52,673 WARN datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:756ms (threshold=300ms), volume=/data/sde/hadoop/hdfs/data&lt;BR /&gt;2024-03-18 19:35:15,334 WARN datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:377ms (threshold=300ms), volume=/data/sdc/hadoop/hdfs/data&lt;BR /&gt;2024-03-18 19:51:57,774 WARN datanode.DataNode (BlockReceiver.java:receivePacket(701)) - Slow BlockReceiver write data to disk cost:375ms (threshold=300ms), volume=/data/sdb/hadoop/hdfs/data&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;As you can see the warning is also on local disks not only across the network&lt;/P&gt;&lt;P&gt;In any case we already checked the network include the switches and we not found a problem&lt;/P&gt;&lt;P&gt;Do you think its could be tuning issue in hdfs parameters or some parameters that can help&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Mar 2024 06:50:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hadoop-warnings-as-slow-block-receive-from-data-node/m-p/385299#M245679</guid>
      <dc:creator>mike_bronson7</dc:creator>
      <dc:date>2024-03-21T06:50:49Z</dc:date>
    </item>
  </channel>
</rss>

