Support Questions

yamingdong666 · ‎12-28-2018

Shu_ashu · ‎12-28-2018

@Jack

There are two types of compactions happens in Acid tables:

1.Minor Compaction:-A ‘minor’ compaction will takes all the delta files and rewrites them to single delta file. This compaction wont take much resources.

2.Major Compaction:-A ‘major’ compaction will takes one or more delta files(same as minor compaction) and the base file for the bucket and rewrites them into a new base file per bucket.

Delta files will be cleared out when Minor/Major compaction happens and all these tasks will be initiated by hive in background based on the hive-site.xml configs, Refer to this link for more details.

Take a look on this thread for understand how to initialize Hive compactions manually.

View solution in original post

Shu_ashu · ‎12-28-2018

@Jack

There are two types of compactions happens in Acid tables:

1.Minor Compaction:-A ‘minor’ compaction will takes all the delta files and rewrites them to single delta file. This compaction wont take much resources.

2.Major Compaction:-A ‘major’ compaction will takes one or more delta files(same as minor compaction) and the base file for the bucket and rewrites them into a new base file per bucket.

Delta files will be cleared out when Minor/Major compaction happens and all these tasks will be initiated by hive in background based on the hive-site.xml configs, Refer to this link for more details.

Take a look on this thread for understand how to initialize Hive compactions manually.

yamingdong666 · ‎01-02-2019

What parameters control the threshold triggering these compressions?

Shu_ashu · ‎01-02-2019

@Jack

THe below parameters controls the triggering the compactions.


Configuration Parameter	Description
`hive.compactor.delta.num.threshold`	Specifies the number of delta directories in a partition that triggers an automatic minor compaction. The default value is 10.
`hive.compactor.delta.pct.threshold`	Specifies the percentage size of delta files relative to the corresponding base files that triggers an automatic major compaction. The default value is.1, which is 10 percent.
`hive.compactor.abortedtxn.threshold`	Specifies the number of aborted transactions on a single partition that trigger an automatic major compaction.

For all the hive compaction parameters refer to the below link:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_data-access/content/understanding-admini...

yamingdong666 · ‎01-02-2019

The delta file can be compressed according to this configuration, but if it is a file like base, it will not be compressed. How can I set it up?

Shu_ashu · ‎01-03-2019

@Jack

Here are the properties:

hive.compactor.delta.pct.threshold

Default: 0.1
Metastore 
Percentage (fractional) size of the delta files relative to the base that will trigger a major compaction. 1 = 100%, so the default 0.1 = 10%.

hive.compactor.abortedtxn.threshold

Default: 1000
Metastore 
Number of aborted transactions involving a given table or partition that will trigger a major compaction.

Setting Compaction properties TBLProperties:

CREATE TABLE table_name ( id int, name string ) 
CLUSTERED BY (id) INTO 2 BUCKETS 
STORED AS ORC 
TBLPROPERTIES ("transactional"="true", 
"compactor.mapreduce.map.memory.mb"="2048", -- specify compaction map job properties 
"compactorthreshold.hive.compactor.delta.num.threshold"="4", -- trigger minor compactionifthere are more than4delta directories 
"compactorthreshold.hive.compactor.delta.pct.threshold"="0.5"-- trigger major compactionifthe ratio of size of delta files to -- size of base files is greater than50% );

ALTER TABLE table_name COMPACT 'minor' 
   WITH OVERWRITE TBLPROPERTIES ("compactor.mapreduce.map.memory.mb"="3072");  -- specify compaction map job properties
ALTER TABLE table_name COMPACT 'major'
   WITH OVERWRITE TBLPROPERTIES ("tblprops.orc.compress.size"="8192");         -- change any other Hive table properties

We can trigger major compactions by using below command:

alter table <table-name> partition(<partition-name>,<nested-partition-name>,..) compact 'major';

More details on this page: https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions

Cloudera Community

Support Questions

The acid table will have folders of delta and base in the HDFS directory. What data is the base folder? Can the base be cleared? If it can be cleared, how can it be automatically cleared?

Hbase Old WALs are not clearing

How to clear /clean-up pg_xlog directory? on ambar...

Clear errors in Bulletin- NiFi

CCA175 certificate not received after clearing the...

Clearing queues from flowfile

Ambari Workflow Manager view with kerberos fails t...

Spark to read the Hive table sub-directory data

How to clear temp data from dataflow / nifi?

oldWALs not getting cleared even with no replicati...

NIFI flow.xml.gz is getting cleared out during NIF...