Support Questions

Find answers, ask questions, and share your expertise

The acid table will have folders of delta and base in the HDFS directory. What data is the base folder? Can the base be cleared? If it can be cleared, how can it be automatically cleared?

avatar
Contributor
 
1 ACCEPTED SOLUTION

avatar
Master Guru
@Jack

There are two types of compactions happens in Acid tables:

1.Minor Compaction:-A ‘minor’ compaction will takes all the delta files and rewrites them to single delta file. This compaction wont take much resources.

2.Major Compaction:-A ‘major’ compaction will takes one or more delta files(same as minor compaction) and the base file for the bucket and rewrites them into a new base file per bucket.

Delta files will be cleared out when Minor/Major compaction happens and all these tasks will be initiated by hive in background based on the hive-site.xml configs, Refer to this link for more details.

Take a look on this thread for understand how to initialize Hive compactions manually.

View solution in original post

5 REPLIES 5

avatar
Master Guru
@Jack

There are two types of compactions happens in Acid tables:

1.Minor Compaction:-A ‘minor’ compaction will takes all the delta files and rewrites them to single delta file. This compaction wont take much resources.

2.Major Compaction:-A ‘major’ compaction will takes one or more delta files(same as minor compaction) and the base file for the bucket and rewrites them into a new base file per bucket.

Delta files will be cleared out when Minor/Major compaction happens and all these tasks will be initiated by hive in background based on the hive-site.xml configs, Refer to this link for more details.

Take a look on this thread for understand how to initialize Hive compactions manually.

avatar
Contributor

What parameters control the threshold triggering these compressions?

avatar
Master Guru
@Jack

THe below parameters controls the triggering the compactions.



Configuration Parameter

Description

hive.compactor.delta.num.thresholdSpecifies the number of delta directories in a partition that triggers an automatic minor compaction. The default value is 10.
hive.compactor.delta.pct.thresholdSpecifies the percentage size of delta files relative to the corresponding base files that triggers an automatic major compaction. The default value is.1, which is 10 percent.
hive.compactor.abortedtxn.thresholdSpecifies the number of aborted transactions on a single partition that trigger an automatic major compaction.

For all the hive compaction parameters refer to the below link:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_data-access/content/understanding-admini...

avatar
Contributor

The delta file can be compressed according to this configuration, but if it is a file like base, it will not be compressed. How can I set it up?

avatar
Master Guru
@Jack

Here are the properties:

hive.compactor.delta.pct.threshold

Default: 0.1
Metastore 
Percentage (fractional) size of the delta files relative to the base that will trigger a major compaction. 1 = 100%, so the default 0.1 = 10%. 

hive.compactor.abortedtxn.threshold

Default: 1000
Metastore 
Number of aborted transactions involving a given table or partition that will trigger a major compaction.

Setting Compaction properties TBLProperties:

CREATE TABLE table_name ( id int, name string ) 
CLUSTERED BY (id) INTO 2 BUCKETS
STORED AS ORC
TBLPROPERTIES ("transactional"="true",
"compactor.mapreduce.map.memory.mb"="2048", -- specify compaction map job properties
"compactorthreshold.hive.compactor.delta.num.threshold"="4", -- trigger minor compactionifthere are more than4delta directories
"compactorthreshold.hive.compactor.delta.pct.threshold"="0.5"-- trigger major compactionifthe ratio of size of delta files to -- size of base files is greater than50% );
ALTER TABLE table_name COMPACT 'minor' 
   WITH OVERWRITE TBLPROPERTIES ("compactor.mapreduce.map.memory.mb"="3072");  -- specify compaction map job properties
ALTER TABLE table_name COMPACT 'major'
   WITH OVERWRITE TBLPROPERTIES ("tblprops.orc.compress.size"="8192");         -- change any other Hive table properties

We can trigger major compactions by using below command:

alter table <table-name> partition(<partition-name>,<nested-partition-name>,..) compact 'major';

More details on this page: https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions