<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: DELETE rows in table, how HDFS file size is impacted ? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/387012#M246219</link>
    <description>&lt;P&gt;Thank you, guys, for your answers.&lt;/P&gt;</description>
    <pubDate>Tue, 23 Apr 2024 08:10:20 GMT</pubDate>
    <dc:creator>Mike_CHU44</dc:creator>
    <dc:date>2024-04-23T08:10:20Z</dc:date>
    <item>
      <title>DELETE rows in table, how HDFS file size is impacted ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/386575#M246082</link>
      <description>&lt;P&gt;Hello.&lt;/P&gt;&lt;P&gt;Before deleting rows in a specific table (463,462 rows in table), HDFS file size is :&lt;/P&gt;&lt;P&gt;$ hadoop fs -du -s -h /apps/hive/warehouse/prd_thmil.db/th_mil_fb_code_value_brut&lt;BR /&gt;54.2 M 162.6 M /apps/hive/warehouse/prd_thmil.db/th_mil_fb_code_value_brut&lt;/P&gt;&lt;P&gt;54.2 Mb is the size of 1 file and each file is replicated 2 times so 162.6 Mb is the total size, it OK.&lt;/P&gt;&lt;P&gt;But after deleting more than 450,000 rows in the table (12,890 rows remaining after the DELETE), the file size didn't change at all.&lt;/P&gt;&lt;P&gt;Is it normal ? When new rows are added in the table, file size won't grow and HDFS will 'overwrite' older data with the new one ?&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;</description>
      <pubDate>Mon, 15 Apr 2024 13:43:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/386575#M246082</guid>
      <dc:creator>Mike_CHU44</dc:creator>
      <dc:date>2024-04-15T13:43:23Z</dc:date>
    </item>
    <item>
      <title>Re: DELETE rows in table, how HDFS file size is impacted ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/386604#M246090</link>
      <description>&lt;P&gt;Existing files won't be rewritten by delete query, instead deleted rows ROW__ID will be written in new delete_delta folder. Read queries will apply deleted ROW__ID on existing files to exclude the rows.&lt;/P&gt;&lt;P&gt;Triggering Major compaction on the table will rewrite new files merging delta &amp;amp; delete_delta folder.&lt;/P&gt;</description>
      <pubDate>Mon, 15 Apr 2024 18:05:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/386604#M246090</guid>
      <dc:creator>nramanaiah</dc:creator>
      <dc:date>2024-04-15T18:05:46Z</dc:date>
    </item>
    <item>
      <title>Re: DELETE rows in table, how HDFS file size is impacted ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/386616#M246091</link>
      <description>&lt;P&gt;Deleting rows in Hive is like hiding books in a library:&lt;/P&gt;&lt;P&gt;File size stays the same&lt;A href="https://alightmotionsapps.com/" target="_self"&gt;:&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;HDFS doesn't erase data, just marks it hidden.&lt;/P&gt;&lt;P&gt;New data fills "deleted" space:&amp;nbsp;New info goes on those hidden shelves first.&lt;/P&gt;&lt;P&gt;No immediate shrink:&amp;nbsp;Resizing files is slow, so HDFS waits.&lt;/P&gt;&lt;P&gt;That's why the file size didn't change. It's normal HDFS behavior!&lt;/P&gt;</description>
      <pubDate>Mon, 15 Apr 2024 20:08:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/386616#M246091</guid>
      <dc:creator>Adword</dc:creator>
      <dc:date>2024-04-15T20:08:38Z</dc:date>
    </item>
    <item>
      <title>Re: DELETE rows in table, how HDFS file size is impacted ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/387012#M246219</link>
      <description>&lt;P&gt;Thank you, guys, for your answers.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Apr 2024 08:10:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/387012#M246219</guid>
      <dc:creator>Mike_CHU44</dc:creator>
      <dc:date>2024-04-23T08:10:20Z</dc:date>
    </item>
    <item>
      <title>Re: DELETE rows in table, how HDFS file size is impacted ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/413672#M254188</link>
      <description>&lt;P&gt;Yes, this is normal behavior in Hive. When you delete rows, the underlying HDFS files usually don't shrink automatically because HDFS doesn't modify files in place. You typically need to run compaction or rewrite the table (like using INSERT OVERWRITE) to reclaim the space.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Mar 2026 12:07:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/413672#M254188</guid>
      <dc:creator>ahmadnaveed</dc:creator>
      <dc:date>2026-03-10T12:07:52Z</dc:date>
    </item>
  </channel>
</rss>

