<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: General guidelines and best practices for tuning dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold property in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/General-guidelines-and-best-practices-for-tuning-dfs/m-p/143875#M106461</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/15391/rathianant20.html" nodeid="15391"&gt;@Anant Rathi&lt;/A&gt; I have some verified answers in this thread from engineering and also another answer from &lt;A rel="user" href="https://community.cloudera.com/users/381/cnauroth.html" nodeid="381"&gt;@Chris Nauroth&lt;/A&gt; there's a reference blog &lt;A href="http://gbif.blogspot.com/2015/05/dont-fill-your-hdfs-disks-upgrading-to.html" target="_blank"&gt;http://gbif.blogspot.com/2015/05/dont-fill-your-hdfs-disks-upgrading-to.html&lt;/A&gt; we don't have field agreement to one or the other policy p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Calibri}
span.s1 {font-kerning: none}&lt;/P&gt;&lt;P&gt;AvailableSpaceVolumeChoosingPolicy is not something that we have ever formally tested or certified.  It was developed at Cloudera.  We do not certify it under our support.&lt;/P&gt;</description>
    <pubDate>Thu, 12 Jan 2017 07:34:18 GMT</pubDate>
    <dc:creator>aervits</dc:creator>
    <dc:date>2017-01-12T07:34:18Z</dc:date>
    <item>
      <title>General guidelines and best practices for tuning dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold property</title>
      <link>https://community.cloudera.com/t5/Support-Questions/General-guidelines-and-best-practices-for-tuning-dfs/m-p/143868#M106454</link>
      <description>&lt;P&gt;I'm looking for general guidelines and best practices from the field on the following two properties in hdfs-site.xml. I am looking for more than description derived from hdfs-default.xml. What are people seeing and what are some of the production values for the two configuration properties?&lt;/P&gt;&lt;PRE&gt;dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold
dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction
&lt;/PRE&gt;</description>
      <pubDate>Fri, 17 Jun 2016 08:09:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/General-guidelines-and-best-practices-for-tuning-dfs/m-p/143868#M106454</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-06-17T08:09:49Z</dc:date>
    </item>
    <item>
      <title>Re: General guidelines and best practices for tuning dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold property</title>
      <link>https://community.cloudera.com/t5/Support-Questions/General-guidelines-and-best-practices-for-tuning-dfs/m-p/143869#M106455</link>
      <description>&lt;P&gt;Hi Artem, we do not recommend using AvailableSpaceVolumeChoosingPolicy. It can cause a subset of disk drives to become a bottleneck for writes. See &lt;A href="https://issues.apache.org/jira/browse/HDFS-8538"&gt;HDFS-8538&lt;/A&gt; for some more discussion on this.&lt;/P&gt;&lt;P&gt;A new HDFS tool called the DiskBalancer is under active development (&lt;A href="https://issues.apache.org/jira/browse/HDFS-1312"&gt;HDFS-1312&lt;/A&gt;). It will allow administrators to recover from skewed distribution caused by replacing failed disks or just adding new disks.&lt;/P&gt;</description>
      <pubDate>Fri, 17 Jun 2016 10:58:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/General-guidelines-and-best-practices-for-tuning-dfs/m-p/143869#M106455</guid>
      <dc:creator>ArpitAgarwal</dc:creator>
      <dc:date>2016-06-17T10:58:17Z</dc:date>
    </item>
    <item>
      <title>Re: General guidelines and best practices for tuning dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold property</title>
      <link>https://community.cloudera.com/t5/Support-Questions/General-guidelines-and-best-practices-for-tuning-dfs/m-p/143870#M106456</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/126/aagarwal.html" nodeid="126"&gt;@Arpit Agarwal&lt;/A&gt; I don't know the intricacies of this. But trying to understand which is a better option - to run the balancer as a recovery mechanism at regular intervals or use a better placement policy while writing the blocks itself. I presume the default block placement policy is RR. So if the placement is round-robin, then the smaller disks are filled-up faster. Instead if the placement policy can take available space and as well as IO throughput for each disk, wouldn't that be a better choice?&lt;/P&gt;&lt;P&gt;Also, as documented these two properties are only applicable when dfs.datanode.fsdataset.volume.choosing.policy is set to org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy (https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml) But I couldn't find any property named dfs.datanode.fsdataset.volume.choosing.policy. Please let me know where this is set.&lt;/P&gt;&lt;P&gt;Please correct me if I am wrong in my understanding. &lt;/P&gt;</description>
      <pubDate>Fri, 17 Jun 2016 21:47:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/General-guidelines-and-best-practices-for-tuning-dfs/m-p/143870#M106456</guid>
      <dc:creator>techiegreenhorn</dc:creator>
      <dc:date>2016-06-17T21:47:41Z</dc:date>
    </item>
    <item>
      <title>Re: General guidelines and best practices for tuning dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold property</title>
      <link>https://community.cloudera.com/t5/Support-Questions/General-guidelines-and-best-practices-for-tuning-dfs/m-p/143871#M106457</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/2369/greenhorntechie.html" nodeid="2369"&gt;@Greenhorn Techie&lt;/A&gt;, yes I agree the ideal placement policy would factor in available space and IO load. However there is no implementation that currently does that.&lt;/P&gt;&lt;P&gt;The property "dfs.datanode.fsdataset.volume.choosing.policy is defined in hdfs-default.xml:&lt;/P&gt;&lt;PRE&gt;&amp;lt;property&amp;gt;
  &amp;lt;name&amp;gt;dfs.datanode.fsdataset.volume.choosing.policy&amp;lt;/name&amp;gt;
  &amp;lt;value&amp;gt;&amp;lt;/value&amp;gt;
  &amp;lt;description&amp;gt;
    The class name of the policy for choosing volumes in the list of
    directories.  Defaults to
    org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy.
    If you would like to take into account available disk space, set the
    value to
    "org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy".
  &amp;lt;/description&amp;gt;
&amp;lt;/property&amp;gt;


&lt;/PRE&gt;</description>
      <pubDate>Sat, 18 Jun 2016 01:17:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/General-guidelines-and-best-practices-for-tuning-dfs/m-p/143871#M106457</guid>
      <dc:creator>ArpitAgarwal</dc:creator>
      <dc:date>2016-06-18T01:17:20Z</dc:date>
    </item>
    <item>
      <title>Re: General guidelines and best practices for tuning dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold property</title>
      <link>https://community.cloudera.com/t5/Support-Questions/General-guidelines-and-best-practices-for-tuning-dfs/m-p/143872#M106458</link>
      <description>&lt;P&gt;Thanks &lt;A rel="user" href="https://community.cloudera.com/users/126/aagarwal.html" nodeid="126"&gt;@Arpit Agarwal&lt;/A&gt; for your response. So finally it boils down to choosing RR vs AvailableSpace policies and Hortonworks recommends using RR policy with DiskBalancer vs Cloudera's recommendation of AvailableSpace policy? Am I correct in saying that? &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 18 Jun 2016 02:46:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/General-guidelines-and-best-practices-for-tuning-dfs/m-p/143872#M106458</guid>
      <dc:creator>techiegreenhorn</dc:creator>
      <dc:date>2016-06-18T02:46:46Z</dc:date>
    </item>
    <item>
      <title>Re: General guidelines and best practices for tuning dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold property</title>
      <link>https://community.cloudera.com/t5/Support-Questions/General-guidelines-and-best-practices-for-tuning-dfs/m-p/143873#M106459</link>
      <description>&lt;P&gt;Hortonworks recommends using the default RoundRobin policy.&lt;/P&gt;</description>
      <pubDate>Sat, 18 Jun 2016 04:23:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/General-guidelines-and-best-practices-for-tuning-dfs/m-p/143873#M106459</guid>
      <dc:creator>ArpitAgarwal</dc:creator>
      <dc:date>2016-06-18T04:23:42Z</dc:date>
    </item>
    <item>
      <title>Re: General guidelines and best practices for tuning dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold property</title>
      <link>https://community.cloudera.com/t5/Support-Questions/General-guidelines-and-best-practices-for-tuning-dfs/m-p/143874#M106460</link>
      <description>&lt;P&gt;I have the exact same question. &lt;A rel="user" href="https://community.cloudera.com/users/393/aervits.html" nodeid="393"&gt;@Artem Ervits&lt;/A&gt; have you come to any conclusion since this thread died last July?&lt;/P&gt;</description>
      <pubDate>Thu, 12 Jan 2017 03:04:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/General-guidelines-and-best-practices-for-tuning-dfs/m-p/143874#M106460</guid>
      <dc:creator>rathianant2_0</dc:creator>
      <dc:date>2017-01-12T03:04:59Z</dc:date>
    </item>
    <item>
      <title>Re: General guidelines and best practices for tuning dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold property</title>
      <link>https://community.cloudera.com/t5/Support-Questions/General-guidelines-and-best-practices-for-tuning-dfs/m-p/143875#M106461</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/15391/rathianant20.html" nodeid="15391"&gt;@Anant Rathi&lt;/A&gt; I have some verified answers in this thread from engineering and also another answer from &lt;A rel="user" href="https://community.cloudera.com/users/381/cnauroth.html" nodeid="381"&gt;@Chris Nauroth&lt;/A&gt; there's a reference blog &lt;A href="http://gbif.blogspot.com/2015/05/dont-fill-your-hdfs-disks-upgrading-to.html" target="_blank"&gt;http://gbif.blogspot.com/2015/05/dont-fill-your-hdfs-disks-upgrading-to.html&lt;/A&gt; we don't have field agreement to one or the other policy p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Calibri}
span.s1 {font-kerning: none}&lt;/P&gt;&lt;P&gt;AvailableSpaceVolumeChoosingPolicy is not something that we have ever formally tested or certified.  It was developed at Cloudera.  We do not certify it under our support.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Jan 2017 07:34:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/General-guidelines-and-best-practices-for-tuning-dfs/m-p/143875#M106461</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2017-01-12T07:34:18Z</dc:date>
    </item>
  </channel>
</rss>

