Member since
08-18-2017
146
Posts
19
Kudos Received
17
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 5655 | 05-09-2024 02:50 PM | |
| 11013 | 09-13-2022 10:50 AM | |
| 4293 | 07-25-2022 12:18 AM | |
| 5598 | 06-24-2019 01:56 PM |
06-01-2026
09:47 PM
This behavior is expected if the table is transactional (ACID-enabled). A DELETE operation does not immediately rewrite the underlying HDFS data files. Instead, Hive records the deleted row identifiers in a delete_delta directory, and query engines apply those delete markers when reading the table. As a result, the original data files remain in place and the HDFS size often stays the same immediately after a large delete. If you deleted 450,000+ rows and only have ~13,000 rows remaining, it's normal that the table directory still occupies roughly the same amount of space. In some cases, storage consumption can even increase temporarily because the delete metadata itself must be stored. To actually reclaim disk space, you typically need to run a major compaction. During major compaction, Hive rewrites the data files, merges delta/delete_delta information, and removes data that is no longer visible to queries. Only after that process completes will you generally see a significant reduction in HDFS usage. One additional point: new inserts do not "overwrite" the deleted rows inside the existing files. HDFS files are immutable, so Hive creates new data files rather than modifying existing ones in place. The cleanup and consolidation happen during compaction rather than during the DELETE itself. You may want to check: Whether the table is ACID/transactional. The contents of the delta_* and delete_delta_* directories. When the next automatic major compaction is scheduled, or whether a manual major compaction is appropriate for your environment.
... View more
05-10-2024
10:39 AM
1 Kudo
Thanks! because i have null values in my data set as well, i used colaesce, and it worked! Your query was the basis though, so thanks again! Query by @nramanaiah that worked for me, as I have null records in the dataset: select Currency, (coalesce(spend_a,0)) + (colaesce(spend_b,0)) + coalesce(spend_c,0)) + coalesce(spend_d,0)) as total_spend from test
... View more
04-23-2024
06:59 AM
If its a tez application, AM logs will show how much memory is currently allocated/consumed by the application & how much free resources available in the queue at that specific time. eg., 2024-04-22 23:27:20,636 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:843776, vCores:206> Free: <memory:2048, vCores:306> pendingRequests: 0 delayedContainers: 205 heartbeats: 101 lastPreemptionHeartbeat: 100 2024-04-22 23:27:30,660 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:155648, vCores:38> Free: <memory:495616, vCores:356> pendingRequests: 0 delayedContainers: 38 heartbeats: 151 lastPreemptionHeartbeat: 150 This allocation details will be logged frequently in Tez AM logs.
... View more
10-03-2022
07:16 AM
@nramanaiah have been able to run further testing and confirm that my partitions are purging as expected! thanks again for the assistance!
... View more
08-16-2022
11:02 PM
@ho_ddeok, Has any of the replies helped resolve your issue? If so, can you please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future?
... View more
08-01-2022
10:41 PM
@Hafiz Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
... View more
07-29-2022
01:11 AM
Not Yet
... View more
06-24-2019
01:56 PM
1 Kudo
This WARN logs should not cause any issue, 1) however if you want to remove this configs, you can use below syntax to delete the config on specific component config_type(Configs can be searched in ambari -> hive configs filter box to know which file to be updated) /var/lib/ambari-server/resources/scripts/configs.py -u <<username>> -p <<password>> -n <<clustername>> -l <<ambari-server-host>> -t <<ambari-server-port>> -a <<action>> -c <<config_type>> -k <<config-key>> eg., /var/lib/ambari-server/resources/scripts/configs.py -u admin -p <<dummy>> -n cluster1 -l ambari-server-host -t 8080 -a delete -c hive-site -k hive.mapred.strict /var/lib/ambari-server/resources/scripts/configs.py -u admin -p <<dummy>> -n cluster1 -l ambari-server-host -t 8080 -a delete -c hive-site -k hive.mapred.supports.subdirectories This is the reference for configs.py https://cwiki.apache.org/confluence/display/AMBARI/Modify+configurations#Modifyconfigurations-Editconfigurationusingconfigs.py 2) To remove log4j warning, goto ambari -> hive configs -> advance hive-log4j, comment below line log4j.appender.DRFA.MaxFileSize After the above 2 changes, restart hive services, all those 3 warns should go away. If this article helps to resolve the issue, accept the answer, it might also help others members in the community.
... View more