I have run manually minor and major compaction. Table data is referring to the new delta and base folder but the old delta and base folders are not removing and it consumes the hive storage. Moreover, No running job is referring to this table. Queries:
create table compaction_check.check_file(name string, age int);
insert into compaction_check.check_file values('user 1', 10),('user 2', 20),('user 3', 30);
insert into compaction_check.check_file values('user 4', 10),('user 5', 30),('user 6', 15);
ALTER TABLE compaction_check.check_file COMPACT 'major';
Check input files:
select distinct INPUT__FILE__NAME FROM compaction_check.check_file;
Actual table directory :
Why stale the base and delta folders are not removed?
Base on the output, looks like the table itself is using the default compaction settings. That's good. Next place to check is the global Hive parameters and their values. Note that some of these settings are turned off by default (e.g. hive.compactor.initiator.on = false). See documentation below:
Ok, after you execute COMPACT on your table. Can you also run:
To see what state the compaction operation ends up in.
Another place to look is in the HMS logs. Search for your table name and see what compaction events have and have not occurred for your table. Please provide pertinent log lines here.
Also check to see if you have this parameter in hive-site.xml:
This is responsible for timing out stale transactions on the table. If it's not on, stale transactions are never cleaned up and, as a consequence, Hive does not remove the old delta files. If you make a change to this value, you'll need to restart the stale service.