<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: DELETE rows in table, how HDFS file size is impacted ? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/386604#M246090</link>
    <description>&lt;P&gt;Existing files won't be rewritten by delete query, instead deleted rows ROW__ID will be written in new delete_delta folder. Read queries will apply deleted ROW__ID on existing files to exclude the rows.&lt;/P&gt;&lt;P&gt;Triggering Major compaction on the table will rewrite new files merging delta &amp;amp; delete_delta folder.&lt;/P&gt;</description>
    <pubDate>Mon, 15 Apr 2024 18:05:46 GMT</pubDate>
    <dc:creator>nramanaiah</dc:creator>
    <dc:date>2024-04-15T18:05:46Z</dc:date>
    <item>
      <title>DELETE rows in table, how HDFS file size is impacted ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/386575#M246082</link>
      <description>&lt;P&gt;Hello.&lt;/P&gt;&lt;P&gt;Before deleting rows in a specific table (463,462 rows in table), HDFS file size is :&lt;/P&gt;&lt;P&gt;$ hadoop fs -du -s -h /apps/hive/warehouse/prd_thmil.db/th_mil_fb_code_value_brut&lt;BR /&gt;54.2 M 162.6 M /apps/hive/warehouse/prd_thmil.db/th_mil_fb_code_value_brut&lt;/P&gt;&lt;P&gt;54.2 Mb is the size of 1 file and each file is replicated 2 times so 162.6 Mb is the total size, it OK.&lt;/P&gt;&lt;P&gt;But after deleting more than 450,000 rows in the table (12,890 rows remaining after the DELETE), the file size didn't change at all.&lt;/P&gt;&lt;P&gt;Is it normal ? When new rows are added in the table, file size won't grow and HDFS will 'overwrite' older data with the new one ?&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;</description>
      <pubDate>Mon, 15 Apr 2024 13:43:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/386575#M246082</guid>
      <dc:creator>Mike_CHU44</dc:creator>
      <dc:date>2024-04-15T13:43:23Z</dc:date>
    </item>
    <item>
      <title>Re: DELETE rows in table, how HDFS file size is impacted ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/386604#M246090</link>
      <description>&lt;P&gt;Existing files won't be rewritten by delete query, instead deleted rows ROW__ID will be written in new delete_delta folder. Read queries will apply deleted ROW__ID on existing files to exclude the rows.&lt;/P&gt;&lt;P&gt;Triggering Major compaction on the table will rewrite new files merging delta &amp;amp; delete_delta folder.&lt;/P&gt;</description>
      <pubDate>Mon, 15 Apr 2024 18:05:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/386604#M246090</guid>
      <dc:creator>nramanaiah</dc:creator>
      <dc:date>2024-04-15T18:05:46Z</dc:date>
    </item>
    <item>
      <title>Re: DELETE rows in table, how HDFS file size is impacted ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/386616#M246091</link>
      <description>&lt;P&gt;Deleting rows in Hive is like hiding books in a library:&lt;/P&gt;&lt;P&gt;File size stays the same&lt;A href="https://alightmotionsapps.com/" target="_self"&gt;:&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;HDFS doesn't erase data, just marks it hidden.&lt;/P&gt;&lt;P&gt;New data fills "deleted" space:&amp;nbsp;New info goes on those hidden shelves first.&lt;/P&gt;&lt;P&gt;No immediate shrink:&amp;nbsp;Resizing files is slow, so HDFS waits.&lt;/P&gt;&lt;P&gt;That's why the file size didn't change. It's normal HDFS behavior!&lt;/P&gt;</description>
      <pubDate>Mon, 15 Apr 2024 20:08:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/386616#M246091</guid>
      <dc:creator>Adword</dc:creator>
      <dc:date>2024-04-15T20:08:38Z</dc:date>
    </item>
    <item>
      <title>Re: DELETE rows in table, how HDFS file size is impacted ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/387012#M246219</link>
      <description>&lt;P&gt;Thank you, guys, for your answers.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Apr 2024 08:10:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/387012#M246219</guid>
      <dc:creator>Mike_CHU44</dc:creator>
      <dc:date>2024-04-23T08:10:20Z</dc:date>
    </item>
    <item>
      <title>Re: DELETE rows in table, how HDFS file size is impacted ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/413672#M254188</link>
      <description>&lt;P&gt;Yes, this is normal behavior in Hive. When you delete rows, the underlying HDFS files usually don't shrink automatically because HDFS doesn't modify files in place. You typically need to run compaction or rewrite the table (like using INSERT OVERWRITE) to reclaim the space.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Mar 2026 12:07:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/413672#M254188</guid>
      <dc:creator>ahmadnaveed</dc:creator>
      <dc:date>2026-03-10T12:07:52Z</dc:date>
    </item>
    <item>
      <title>Re: DELETE rows in table, how HDFS file size is impacted ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/413961#M254296</link>
      <description>&lt;P class="font-claude-response-body break-words whitespace-normal leading-[1.7]"&gt;Deleting rows in an HDFS-backed table does not immediately reduce file size because HDFS is immutable by design and individual records cannot be removed in place.&lt;/P&gt;&lt;P class="font-claude-response-body break-words whitespace-normal leading-[1.7]"&gt;In Hive ACID tables, a DELETE does not touch the base data files at all. Instead, it writes a separate delete delta file that marks rows as logically deleted using row ID references. The physical file size on HDFS stays the same or increases because new delta files are being added. Actual size reduction only happens after a major compaction runs, which rewrites the base files by merging all deltas and physically excluding deleted rows, followed by the HDFS cleaner removing the old files.&lt;/P&gt;&lt;P class="font-claude-response-body break-words whitespace-normal leading-[1.7]"&gt;In Apache Iceberg, deletes produce position or equality delete files written alongside existing data files, again increasing HDFS usage until a rewrite data files compaction purges the old data. In Apache Hudi Copy&lt;A title="Lubepit" href="https://lubepit.com/" target="_self"&gt;-&lt;/A&gt;On-Write, a DELETE rewrites the entire affected file immediately so size does reduce, but with heavy write amplification. In Merge&lt;A title="Nekopoi" href="https://nekopoidl.com/" target="_self"&gt;-&lt;/A&gt;On-Read, deletes are appended as log files and compaction is still required for physical reclamation.&lt;/P&gt;&lt;P class="font-claude-response-body break-words whitespace-normal leading-[1.7]"&gt;The bottom line is that DELETE is always append-driven at the HDFS storage layer regardless of table format, and true physical space reclamation requires compaction to run and obsolete files to be purged.&lt;/P&gt;</description>
      <pubDate>Tue, 28 Apr 2026 07:30:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/413961#M254296</guid>
      <dc:creator>Tommike</dc:creator>
      <dc:date>2026-04-28T07:30:46Z</dc:date>
    </item>
    <item>
      <title>Re: DELETE rows in table, how HDFS file size is impacted ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/414155#M254736</link>
      <description>&lt;P&gt;This behavior is expected if the table is transactional (ACID-enabled).&lt;/P&gt;&lt;P&gt;A DELETE operation does not immediately rewrite the underlying HDFS data files. Instead, Hive records the deleted row identifiers in a delete_delta directory, and query engines apply those delete markers when reading the table. As a result, the original data files remain in place and the HDFS size often stays the same immediately after a large delete.&lt;/P&gt;&lt;P&gt;If you deleted 450,000+ rows and only have ~13,000 rows remaining, it's normal that the table directory still occupies roughly the same amount of space. In some cases, storage consumption can even increase temporarily because the delete metadata itself must be stored.&lt;/P&gt;&lt;P&gt;To actually reclaim disk space, you typically need to run a &lt;STRONG&gt;major compaction&lt;/STRONG&gt;. During major compaction, Hive rewrites the data files, merges delta/delete_delta information, and removes data that is no longer visible to queries. Only after that process completes will you generally see a significant reduction in HDFS usage.&lt;/P&gt;&lt;P&gt;One additional point: new inserts do not "overwrite" the deleted rows inside the existing files. HDFS files are immutable, so Hive creates new data files rather than modifying existing ones in place. The cleanup and consolidation happen during compaction rather than during the DELETE itself.&lt;/P&gt;&lt;P&gt;You may want to check:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Whether the table is ACID/transactional&lt;A href="https://iyacinetvapk.com/" target="_self"&gt;.&lt;/A&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;The contents of the delta_* and delete_delta_* directories.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;When the next automatic major compaction is scheduled, or whether a manual major compaction is appropriate for your environment.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Tue, 02 Jun 2026 04:47:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/DELETE-rows-in-table-how-HDFS-file-size-is-impacted/m-p/414155#M254736</guid>
      <dc:creator>robert12231</dc:creator>
      <dc:date>2026-06-02T04:47:30Z</dc:date>
    </item>
  </channel>
</rss>

