Created 06-01-2021 07:43 AM
I have run manually minor and major compaction. Table data is referring to the new delta and base folder but the old delta and base folders are not removing and it consumes the hive storage. Moreover, No running job is referring to this table.
Queries:
create table compaction_check.check_file(name string, age int);
insert into compaction_check.check_file values('user 1', 10),('user 2', 20),('user 3', 30);
insert into compaction_check.check_file values('user 4', 10),('user 5', 30),('user 6', 15);
ALTER TABLE compaction_check.check_file COMPACT 'major';
Check input files:
select distinct INPUT__FILE__NAME FROM compaction_check.check_file;
output:
Actual table directory :
Why stale the base and delta folders are not removed?
Created 06-01-2021 10:45 AM
Hello,
Could you please provide the output of
SHOW TBLPROPERTIES compaction_check.check_file;
Also, which version of CDH/HDP or CDP are you using?
Thank you,
Alex
Created 06-01-2021 09:21 PM
Hi @aakulov,
I am using Cloudera 7.1.3 and HIVE version: 3.1.3
Table TBLPROPERTIES Output:
Thanks in Advance.
Created 06-02-2021 07:45 AM
Hello again,
Base on the output, looks like the table itself is using the default compaction settings. That's good. Next place to check is the global Hive parameters and their values. Note that some of these settings are turned off by default (e.g. hive.compactor.initiator.on = false). See documentation below:
Hope this helps,
Alex
Created 06-03-2021 02:56 AM
Hello @aakulov,
In my Hive Configuration,
hive.compactor.initiator.on = Enabled
Here is the screenshot of Hive and hiveServer2 configurations:
Now, Is there anything else that I have to check or configure to remove the older base and delta folders?
Thank you again, Alex.
Created 06-03-2021 06:49 AM
Ok, after you execute COMPACT on your table. Can you also run:
SHOW COMPACTIONS;
To see what state the compaction operation ends up in.
Another place to look is in the HMS logs. Search for your table name and see what compaction events have and have not occurred for your table. Please provide pertinent log lines here.
Also check to see if you have this parameter in hive-site.xml:
hive.metastore.housekeeping.threads.on==true
This is responsible for timing out stale transactions on the table. If it's not on, stale transactions are never cleaned up and, as a consequence, Hive does not remove the old delta files. If you make a change to this value, you'll need to restart the stale service.
Created 06-09-2021 09:17 AM
Were you able to have Hive clean up the obsolete files? Just checking in.