Member since
05-03-2022
6
Posts
4
Kudos Received
1
Solution
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1390 | 02-08-2024 06:35 AM |
04-28-2026
12:30 AM
Deleting rows in an HDFS-backed table does not immediately reduce file size because HDFS is immutable by design and individual records cannot be removed in place. In Hive ACID tables, a DELETE does not touch the base data files at all. Instead, it writes a separate delete delta file that marks rows as logically deleted using row ID references. The physical file size on HDFS stays the same or increases because new delta files are being added. Actual size reduction only happens after a major compaction runs, which rewrites the base files by merging all deltas and physically excluding deleted rows, followed by the HDFS cleaner removing the old files. In Apache Iceberg, deletes produce position or equality delete files written alongside existing data files, again increasing HDFS usage until a rewrite data files compaction purges the old data. In Apache Hudi Copy-On-Write, a DELETE rewrites the entire affected file immediately so size does reduce, but with heavy write amplification. In Merge-On-Read, deletes are appended as log files and compaction is still required for physical reclamation. The bottom line is that DELETE is always append-driven at the HDFS storage layer regardless of table format, and true physical space reclamation requires compaction to run and obsolete files to be purged.
... View more
06-24-2024
11:27 AM
@Mike_CHU44 Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
02-08-2024
06:35 AM
Solved : As we are running Cloudera Runtime 7.1.7, DAS is not deprecated so we have to use it : https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/runtime-release-notes/topics/rt-pvc-deprecated-das.html
... View more
02-08-2024
04:02 AM
2 Kudos
The application ask for container run some part in that container and then release it back. So the 28 vcores that you are seeing is due to that. Let's say your job asks for 4 containers and eack with 7 vcores so at first only two containers will run as you have limit of 15 vcores. but if one container is released then that job will take another container with 7 vcores so in total now the number of vcores used is 21.
... View more