<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Disk size used is bigger than replication number multiplied by files size in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Disk-size-used-is-bigger-than-replication-number-multiplied/m-p/165045#M127412</link>
    <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/16652/dvt.html"&gt;dvt isoft&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Not necessarily. That would be only if your blocks will be 100% filled with data.&lt;/P&gt;&lt;P&gt;Let's say you have a 1024 MB file and the block size is 128 MB. That would be exactly 8 blocks at 100%.&lt;/P&gt;&lt;P&gt;Let's say you have 968 MB file and the block size is128 MB. That is still 8 blocks but with lower usage. A block once used by a file cannot be reused for a different file.&lt;/P&gt;&lt;P&gt;That's why loading small files could be a waste.&lt;/P&gt;&lt;P&gt;Just imagine 100 files of each 100 KB will be using 100 blocks for 128 MB, 10x more than the examples I provided above.&lt;/P&gt;&lt;P&gt;You need to understand your files, block % usage etc. &lt;/P&gt;&lt;P&gt;The command you execute shows the blocks empty x size/block ... I know that is confusing &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;+++&lt;/P&gt;&lt;P&gt;If this is helpful please vote and accept as the best answer.&lt;/P&gt;</description>
    <pubDate>Tue, 21 Mar 2017 01:00:30 GMT</pubDate>
    <dc:creator>cstanca</dc:creator>
    <dc:date>2017-03-21T01:00:30Z</dc:date>
    <item>
      <title>Disk size used is bigger than replication number multiplied by files size</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Disk-size-used-is-bigger-than-replication-number-multiplied/m-p/165042#M127409</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am running Hadoop on a 3 nodes cluster (3 virtual machines) with respectively 20Gb, 10Gb and 10Gb of disk space available.&lt;/P&gt;&lt;P&gt;When I run this command on the namenode :&lt;/P&gt;&lt;PRE&gt;hadoop fs -df -h /&lt;/PRE&gt;&lt;P&gt;I get the following result :&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="13803-1.png" style="width: 274px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/20421i92E2C8DF575A3D62/image-size/medium?v=v2&amp;amp;px=400" role="button" title="13803-1.png" alt="13803-1.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;IMG alt="" /&gt;&lt;/P&gt;&lt;P&gt;When I run this command :&lt;/P&gt;&lt;PRE&gt;hadoop fs -du -s -h /&lt;/PRE&gt;&lt;P&gt;I get the following result :&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="13804-2.png" style="width: 71px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/20422i95CD350DC376F462/image-size/medium?v=v2&amp;amp;px=400" role="button" title="13804-2.png" alt="13804-2.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;IMG alt="" /&gt;&lt;/P&gt;&lt;P&gt;Knowing that the replication number is set to 3, shouldn't I get 3*2,7 = 8,1G in the first screenshot ?&lt;/P&gt;&lt;P&gt;I tried to execute expunge command and it did not change the result.&lt;/P&gt;&lt;P&gt;Thanks in advance !&lt;/P&gt;&lt;P&gt;Sylvain.&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 11:23:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Disk-size-used-is-bigger-than-replication-number-multiplied/m-p/165042#M127409</guid>
      <dc:creator>dvt_isoft</dc:creator>
      <dc:date>2019-08-18T11:23:35Z</dc:date>
    </item>
    <item>
      <title>Re: Disk size used is bigger than replication number multiplied by files size</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Disk-size-used-is-bigger-than-replication-number-multiplied/m-p/165043#M127410</link>
      <description>&lt;P&gt;Can you please check if the screenshots are uploaded properly because it is not seen on this end.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Mar 2017 19:05:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Disk-size-used-is-bigger-than-replication-number-multiplied/m-p/165043#M127410</guid>
      <dc:creator>bhavintandel</dc:creator>
      <dc:date>2017-03-20T19:05:57Z</dc:date>
    </item>
    <item>
      <title>Re: Disk size used is bigger than replication number multiplied by files size</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Disk-size-used-is-bigger-than-replication-number-multiplied/m-p/165044#M127411</link>
      <description>&lt;P&gt;It should be alright now.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Mar 2017 20:04:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Disk-size-used-is-bigger-than-replication-number-multiplied/m-p/165044#M127411</guid>
      <dc:creator>dvt_isoft</dc:creator>
      <dc:date>2017-03-20T20:04:24Z</dc:date>
    </item>
    <item>
      <title>Re: Disk size used is bigger than replication number multiplied by files size</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Disk-size-used-is-bigger-than-replication-number-multiplied/m-p/165045#M127412</link>
      <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/16652/dvt.html"&gt;dvt isoft&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Not necessarily. That would be only if your blocks will be 100% filled with data.&lt;/P&gt;&lt;P&gt;Let's say you have a 1024 MB file and the block size is 128 MB. That would be exactly 8 blocks at 100%.&lt;/P&gt;&lt;P&gt;Let's say you have 968 MB file and the block size is128 MB. That is still 8 blocks but with lower usage. A block once used by a file cannot be reused for a different file.&lt;/P&gt;&lt;P&gt;That's why loading small files could be a waste.&lt;/P&gt;&lt;P&gt;Just imagine 100 files of each 100 KB will be using 100 blocks for 128 MB, 10x more than the examples I provided above.&lt;/P&gt;&lt;P&gt;You need to understand your files, block % usage etc. &lt;/P&gt;&lt;P&gt;The command you execute shows the blocks empty x size/block ... I know that is confusing &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;+++&lt;/P&gt;&lt;P&gt;If this is helpful please vote and accept as the best answer.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Mar 2017 01:00:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Disk-size-used-is-bigger-than-replication-number-multiplied/m-p/165045#M127412</guid>
      <dc:creator>cstanca</dc:creator>
      <dc:date>2017-03-21T01:00:30Z</dc:date>
    </item>
  </channel>
</rss>

