<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question How to clean datanodes / nodemanagers data after multiple spark-submits? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-to-clean-datanodes-nodemanagers-data-after-multiple/m-p/330484#M230696</link>
    <description>&lt;P&gt;Hello !&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am performing extensive experiments over my 3-nodes (VMs) cluster. my VMs have a disk space of 50GB each, and checking the space available (on localhost:9870 (namenode's UI)) after 10 spark-submit application submissions reveal that the hard disks are almost plenty. How to delete that created data without restarting and reformatting the hdfs ?&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was thinking of a datanode clean up command to use here.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
    <pubDate>Thu, 18 Nov 2021 14:22:06 GMT</pubDate>
    <dc:creator>hadoopFreak01</dc:creator>
    <dc:date>2021-11-18T14:22:06Z</dc:date>
    <item>
      <title>How to clean datanodes / nodemanagers data after multiple spark-submits?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-clean-datanodes-nodemanagers-data-after-multiple/m-p/330484#M230696</link>
      <description>&lt;P&gt;Hello !&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am performing extensive experiments over my 3-nodes (VMs) cluster. my VMs have a disk space of 50GB each, and checking the space available (on localhost:9870 (namenode's UI)) after 10 spark-submit application submissions reveal that the hard disks are almost plenty. How to delete that created data without restarting and reformatting the hdfs ?&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was thinking of a datanode clean up command to use here.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Thu, 18 Nov 2021 14:22:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-clean-datanodes-nodemanagers-data-after-multiple/m-p/330484#M230696</guid>
      <dc:creator>hadoopFreak01</dc:creator>
      <dc:date>2021-11-18T14:22:06Z</dc:date>
    </item>
    <item>
      <title>Re: How to clean datanodes / nodemanagers data after multiple spark-submits?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-clean-datanodes-nodemanagers-data-after-multiple/m-p/330557#M230708</link>
      <description>&lt;P&gt;Hello!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can remove the data from HDFS using the following command&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;#hdfs dfs -rm -R -skipTrash &amp;lt;Extra-Data-folder&amp;gt;&lt;/P&gt;&lt;P&gt;#hdfs dfs -rm -r /tmp/spark&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This issue is caused by having too many Datanodes with too high of disk utilization thus reducing the total number of Datanodes available for write requests.&lt;BR /&gt;As a result, Datanodes which are still available for writes will be targeted at a higher rate - increasing their transceiver activity to the point of being "overloaded".&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;In order to correct this, efforts should be made to reduce the disk utilization of Datanodes in the cluster whose disk capacity limits have been reached.&lt;BR /&gt;Adding additional drives to increase storage space, deleting unwanted/non-critical data from HDFS, or adding additional Datanodes to the cluster are all worthwhile solutions to address this problem.&lt;/LI&gt;&lt;LI&gt;There is also a workaround available to address Datanode rejections due to higher-than-normal transceiver volumes. However, it should be noted that this is not a long-term solution, and should only be used temporarily:&lt;/LI&gt;&lt;LI&gt;Change the 'dfs.namenode.replication.considerLoad' parameter to equal 'false' under HDFS &amp;gt; Configurations &amp;gt; "NameNode Advanced Configuration Snippet (Safety Valve)"in Cloudera Manager. This will effectively tell the NameNode to ignore current transceiver activity when choosing a Datanode for block placement. This can have unintended consequences if left on permanently, as the NameNode can potentially overwhelm Datanodes with too many requests - the considerLoad parameter is there to prevent that.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Hopefully the provided solution will help resolve the issue.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Vaishnavi Nalawade&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 19 Nov 2021 10:04:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-clean-datanodes-nodemanagers-data-after-multiple/m-p/330557#M230708</guid>
      <dc:creator>vaish_nalawa</dc:creator>
      <dc:date>2021-11-19T10:04:14Z</dc:date>
    </item>
    <item>
      <title>Re: How to clean datanodes / nodemanagers data after multiple spark-submits?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-clean-datanodes-nodemanagers-data-after-multiple/m-p/330606#M230714</link>
      <description>&lt;P&gt;Thanks, but i want to remove data resulting from executing Spark applications through the command &lt;EM&gt;spark-submit&lt;/EM&gt; not from HDFS, could you confirm those are the commands to use in this case ?&lt;/P&gt;</description>
      <pubDate>Fri, 19 Nov 2021 16:01:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-clean-datanodes-nodemanagers-data-after-multiple/m-p/330606#M230714</guid>
      <dc:creator>hadoopFreak01</dc:creator>
      <dc:date>2021-11-19T16:01:15Z</dc:date>
    </item>
  </channel>
</rss>

