- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
The acid table will have folders of delta and base in the HDFS directory. What data is the base folder? Can the base be cleared? If it can be cleared, how can it be automatically cleared?
- Labels:
-
Apache Hive
Created ‎12-28-2018 01:19 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎12-28-2018 03:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are two types of compactions happens in Acid tables:
1.Minor Compaction:-A ‘minor’ compaction will takes all the delta files and rewrites them to single delta file. This compaction wont take much resources.
2.Major Compaction:-A ‘major’ compaction will takes one or more delta files(same as minor compaction) and the base file for the bucket and rewrites them into a new base file per bucket.
Delta files will be cleared out when Minor/Major compaction happens and all these tasks will be initiated by hive in background based on the hive-site.xml configs, Refer to this link for more details.
Take a look on this thread for understand how to initialize Hive compactions manually.
Created ‎12-28-2018 03:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are two types of compactions happens in Acid tables:
1.Minor Compaction:-A ‘minor’ compaction will takes all the delta files and rewrites them to single delta file. This compaction wont take much resources.
2.Major Compaction:-A ‘major’ compaction will takes one or more delta files(same as minor compaction) and the base file for the bucket and rewrites them into a new base file per bucket.
Delta files will be cleared out when Minor/Major compaction happens and all these tasks will be initiated by hive in background based on the hive-site.xml configs, Refer to this link for more details.
Take a look on this thread for understand how to initialize Hive compactions manually.
Created ‎01-02-2019 01:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What parameters control the threshold triggering these compressions?
Created ‎01-02-2019 03:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
THe below parameters controls the triggering the compactions.
Configuration Parameter | Description |
hive.compactor.delta.num.threshold | Specifies the number of delta directories in a partition that triggers an automatic minor compaction. The default value is 10. |
hive.compactor.delta.pct.threshold | Specifies the percentage size of delta files relative to the corresponding base files that triggers an automatic major compaction. The default value is.1, which is 10 percent. |
hive.compactor.abortedtxn.threshold | Specifies the number of aborted transactions on a single partition that trigger an automatic major compaction. |
For all the hive compaction parameters refer to the below link:
Created ‎01-02-2019 08:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The delta file can be compressed according to this configuration, but if it is a file like base, it will not be compressed. How can I set it up?
Created ‎01-03-2019 01:51 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here are the properties:
hive.compactor.delta.pct.threshold
Default: 0.1 Metastore Percentage (fractional) size of the delta files relative to the base that will trigger a major compaction. 1 = 100%, so the default 0.1 = 10%.
hive.compactor.abortedtxn.threshold
Default: 1000 Metastore Number of aborted transactions involving a given table or partition that will trigger a major compaction.
Setting Compaction properties TBLProperties:
CREATE TABLE table_name ( id int, name string )
CLUSTERED BY (id) INTO 2 BUCKETS
STORED AS ORC
TBLPROPERTIES ("transactional"="true",
"compactor.mapreduce.map.memory.mb"="2048", -- specify compaction map job properties
"compactorthreshold.hive.compactor.delta.num.threshold"="4", -- trigger minor compactionifthere are more than4delta directories
"compactorthreshold.hive.compactor.delta.pct.threshold"="0.5"-- trigger major compactionifthe ratio of size of delta files to -- size of base files is greater than50% );
ALTER TABLE table_name COMPACT 'minor' WITH OVERWRITE TBLPROPERTIES ("compactor.mapreduce.map.memory.mb"="3072"); -- specify compaction map job properties ALTER TABLE table_name COMPACT 'major' WITH OVERWRITE TBLPROPERTIES ("tblprops.orc.compress.size"="8192"); -- change any other Hive table properties
We can trigger major compactions by using below command:
alter table <table-name> partition(<partition-name>,<nested-partition-name>,..) compact 'major';
More details on this page: https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions
