Created 12-28-2018 01:19 AM
Created 12-28-2018 03:35 AM
There are two types of compactions happens in Acid tables:
1.Minor Compaction:-A ‘minor’ compaction will takes all the delta files and rewrites them to single delta file. This compaction wont take much resources.
2.Major Compaction:-A ‘major’ compaction will takes one or more delta files(same as minor compaction) and the base file for the bucket and rewrites them into a new base file per bucket.
Delta files will be cleared out when Minor/Major compaction happens and all these tasks will be initiated by hive in background based on the hive-site.xml configs, Refer to this link for more details.
Take a look on this thread for understand how to initialize Hive compactions manually.
Created 12-28-2018 03:35 AM
There are two types of compactions happens in Acid tables:
1.Minor Compaction:-A ‘minor’ compaction will takes all the delta files and rewrites them to single delta file. This compaction wont take much resources.
2.Major Compaction:-A ‘major’ compaction will takes one or more delta files(same as minor compaction) and the base file for the bucket and rewrites them into a new base file per bucket.
Delta files will be cleared out when Minor/Major compaction happens and all these tasks will be initiated by hive in background based on the hive-site.xml configs, Refer to this link for more details.
Take a look on this thread for understand how to initialize Hive compactions manually.
Created 01-02-2019 01:07 AM
What parameters control the threshold triggering these compressions?
Created 01-02-2019 03:30 AM
THe below parameters controls the triggering the compactions.
Configuration Parameter | Description |
hive.compactor.delta.num.threshold | Specifies the number of delta directories in a partition that triggers an automatic minor compaction. The default value is 10. |
hive.compactor.delta.pct.threshold | Specifies the percentage size of delta files relative to the corresponding base files that triggers an automatic major compaction. The default value is.1, which is 10 percent. |
hive.compactor.abortedtxn.threshold | Specifies the number of aborted transactions on a single partition that trigger an automatic major compaction. |
For all the hive compaction parameters refer to the below link:
Created 01-02-2019 08:34 AM
The delta file can be compressed according to this configuration, but if it is a file like base, it will not be compressed. How can I set it up?
Created 01-03-2019 01:51 AM
Here are the properties:
hive.compactor.delta.pct.threshold
Default: 0.1 Metastore Percentage (fractional) size of the delta files relative to the base that will trigger a major compaction. 1 = 100%, so the default 0.1 = 10%.
hive.compactor.abortedtxn.threshold
Default: 1000 Metastore Number of aborted transactions involving a given table or partition that will trigger a major compaction.
Setting Compaction properties TBLProperties:
CREATE TABLE table_name ( id int, name string )
CLUSTERED BY (id) INTO 2 BUCKETS
STORED AS ORC
TBLPROPERTIES ("transactional"="true",
"compactor.mapreduce.map.memory.mb"="2048", -- specify compaction map job properties
"compactorthreshold.hive.compactor.delta.num.threshold"="4", -- trigger minor compactionifthere are more than4delta directories
"compactorthreshold.hive.compactor.delta.pct.threshold"="0.5"-- trigger major compactionifthe ratio of size of delta files to -- size of base files is greater than50% );
ALTER TABLE table_name COMPACT 'minor' WITH OVERWRITE TBLPROPERTIES ("compactor.mapreduce.map.memory.mb"="3072"); -- specify compaction map job properties ALTER TABLE table_name COMPACT 'major' WITH OVERWRITE TBLPROPERTIES ("tblprops.orc.compress.size"="8192"); -- change any other Hive table properties
We can trigger major compactions by using below command:
alter table <table-name> partition(<partition-name>,<nested-partition-name>,..) compact 'major';
More details on this page: https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions