Member since
05-03-2022
6
Posts
4
Kudos Received
1
Solution
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1436 | 02-08-2024 06:35 AM |
06-01-2026
09:47 PM
This behavior is expected if the table is transactional (ACID-enabled). A DELETE operation does not immediately rewrite the underlying HDFS data files. Instead, Hive records the deleted row identifiers in a delete_delta directory, and query engines apply those delete markers when reading the table. As a result, the original data files remain in place and the HDFS size often stays the same immediately after a large delete. If you deleted 450,000+ rows and only have ~13,000 rows remaining, it's normal that the table directory still occupies roughly the same amount of space. In some cases, storage consumption can even increase temporarily because the delete metadata itself must be stored. To actually reclaim disk space, you typically need to run a major compaction. During major compaction, Hive rewrites the data files, merges delta/delete_delta information, and removes data that is no longer visible to queries. Only after that process completes will you generally see a significant reduction in HDFS usage. One additional point: new inserts do not "overwrite" the deleted rows inside the existing files. HDFS files are immutable, so Hive creates new data files rather than modifying existing ones in place. The cleanup and consolidation happen during compaction rather than during the DELETE itself. You may want to check: Whether the table is ACID/transactional. The contents of the delta_* and delete_delta_* directories. When the next automatic major compaction is scheduled, or whether a manual major compaction is appropriate for your environment.
... View more
06-24-2024
11:27 AM
@Mike_CHU44 Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
02-08-2024
06:35 AM
Solved : As we are running Cloudera Runtime 7.1.7, DAS is not deprecated so we have to use it : https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/runtime-release-notes/topics/rt-pvc-deprecated-das.html
... View more
02-08-2024
04:02 AM
2 Kudos
The application ask for container run some part in that container and then release it back. So the 28 vcores that you are seeing is due to that. Let's say your job asks for 4 containers and eack with 7 vcores so at first only two containers will run as you have limit of 15 vcores. but if one container is released then that job will take another container with 7 vcores so in total now the number of vcores used is 21.
... View more