<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: HDFS file and block size in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HDFS-file-and-block-size/m-p/36308#M15178</link>
    <description>Yes, blocks are not pre-allocated. They are a logical division unit. Read&lt;BR /&gt;&lt;A href="https://wiki.apache.org/hadoop/FAQ#If_a_block_size_of_64MB_is_used_and_a_file_is_written_that_uses_less_than_64MB.2C_will_64MB_of_disk_space_be_consumed.3F" target="_blank"&gt;https://wiki.apache.org/hadoop/FAQ#If_a_block_size_of_64MB_is_used_and_a_file_is_written_that_uses_less_than_64MB.2C_will_64MB_of_disk_space_be_consumed.3F&lt;/A&gt;&lt;BR /&gt;</description>
    <pubDate>Fri, 15 Jan 2016 18:51:58 GMT</pubDate>
    <dc:creator>Harsh J</dc:creator>
    <dc:date>2016-01-15T18:51:58Z</dc:date>
    <item>
      <title>HDFS file and block size</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HDFS-file-and-block-size/m-p/36304#M15177</link>
      <description>&lt;DIV class="post-text"&gt;&lt;P&gt;I got below details through hadoop fsck /&lt;/P&gt;&lt;P&gt;Total size: 41514639144544 B (Total open files size: 581 B)&lt;/P&gt;&lt;P&gt;Total dirs: 40524&lt;/P&gt;&lt;P&gt;Total files: 124348 Total symlinks: 0 (Files currently being written: 7)&lt;/P&gt;&lt;P&gt;Total blocks (validated): 340802 (avg. block size 121814540 B) (Total open file blocks (not validated): 7) Minimally replicated blocks: 340802 (100.0 %)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am usign 256MB block size. so 340802 blocks * 256 MB = 83.2TB * 3(replicas) =249.6 TB but in cloudera manager it shows 110 TB disk used. how is it possible?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Does this mean even though block size is 256MB, small file doesnt use the whole block for itself?&lt;/P&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 16 Sep 2022 09:57:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HDFS-file-and-block-size/m-p/36304#M15177</guid>
      <dc:creator>naveen1</dc:creator>
      <dc:date>2022-09-16T09:57:40Z</dc:date>
    </item>
    <item>
      <title>Re: HDFS file and block size</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HDFS-file-and-block-size/m-p/36308#M15178</link>
      <description>Yes, blocks are not pre-allocated. They are a logical division unit. Read&lt;BR /&gt;&lt;A href="https://wiki.apache.org/hadoop/FAQ#If_a_block_size_of_64MB_is_used_and_a_file_is_written_that_uses_less_than_64MB.2C_will_64MB_of_disk_space_be_consumed.3F" target="_blank"&gt;https://wiki.apache.org/hadoop/FAQ#If_a_block_size_of_64MB_is_used_and_a_file_is_written_that_uses_less_than_64MB.2C_will_64MB_of_disk_space_be_consumed.3F&lt;/A&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 15 Jan 2016 18:51:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HDFS-file-and-block-size/m-p/36308#M15178</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2016-01-15T18:51:58Z</dc:date>
    </item>
    <item>
      <title>Re: HDFS file and block size</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HDFS-file-and-block-size/m-p/36309#M15179</link>
      <description>I have 194945 files that are less than 50MB and these files occupying 884GB memory. how to calculate the memory that these files will occupy if I hadoop archive them. 2) Am I using my hdfs efficiently as there are small files and I am not wasting any memory here. 3) Does archiving really save my disk space or it just reduces the namesapce ovevrhead. Harsh can you give me a detailed picture of this.</description>
      <pubDate>Fri, 15 Jan 2016 19:19:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HDFS-file-and-block-size/m-p/36309#M15179</guid>
      <dc:creator>naveen1</dc:creator>
      <dc:date>2016-01-15T19:19:48Z</dc:date>
    </item>
    <item>
      <title>Re: HDFS file and block size</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HDFS-file-and-block-size/m-p/36310#M15180</link>
      <description>So even though I archive these files, I wont be saving any disk space, is that right.</description>
      <pubDate>Fri, 15 Jan 2016 19:27:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HDFS-file-and-block-size/m-p/36310#M15180</guid>
      <dc:creator>naveen1</dc:creator>
      <dc:date>2016-01-15T19:27:41Z</dc:date>
    </item>
    <item>
      <title>Re: HDFS file and block size</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HDFS-file-and-block-size/m-p/36356#M15181</link>
      <description>&lt;P&gt;You won't save HDFS filesystem space by "archiving" or "combining" small files. In many scenarios you will get a performance boost from combining. You will also reduce the metadata overhead on the namenode by combining as well.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jan 2016 22:10:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HDFS-file-and-block-size/m-p/36356#M15181</guid>
      <dc:creator>ben.hemphill</dc:creator>
      <dc:date>2016-01-18T22:10:50Z</dc:date>
    </item>
  </channel>
</rss>

