Support Questions
Find answers, ask questions, and share your expertise

HIVE: No Cleanup working after Compaction:

Explorer

I have run manually minor and major compaction. Table data is referring to the new delta and base folder but the old delta and base folders are not removing and it consumes the hive storage. Moreover, No running job is referring to this table.
Queries: 

 

create table compaction_check.check_file(name string, age int);
insert into compaction_check.check_file values('user 1', 10),('user 2', 20),('user 3', 30);
insert into compaction_check.check_file values('user 4', 10),('user 5', 30),('user 6', 15);
ALTER TABLE compaction_check.check_file COMPACT 'major';

 

 Check input files:

 

select distinct INPUT__FILE__NAME FROM compaction_check.check_file;

 

output: Screenshot 2021-06-01 at 20.19.12.png

 Actual table directory :
Screenshot 2021-06-01 at 20.18.29.png

 

Why stale the base and delta folders are not removed?

 



 

6 REPLIES 6

Super Collaborator

Hello,

 

Could you please provide the output of 

SHOW TBLPROPERTIES compaction_check.check_file;

Also, which version of CDH/HDP or CDP are you using?

 

 

Thank you,

Alex

Explorer

Hi @aakulov

I am using Cloudera 7.1.3 and HIVE version: 3.1.3

Table TBLPROPERTIES Output: 

Screenshot 2021-06-02 at 9.27.06 AM.png

 


Thanks in Advance. 

Super Collaborator

Hello again,

 

Base on the output, looks like the table itself is using the default compaction settings. That's good. Next place to check is the global Hive parameters and their values. Note that some of these settings are turned off by default (e.g. hive.compactor.initiator.on = false). See documentation below:

https://docs.cloudera.com/cdp-private-cloud-base/7.1.3/managing-hive/topics/hive-compact-properties....

 

Hope this helps,

Alex

Explorer

Hello @aakulov,
In my Hive Configuration, 

hive.compactor.initiator.on = Enabled

Here is the screenshot of Hive and hiveServer2 configurations:
Screenshot 2021-06-03 at 3.32.45 PM.png

 

Screenshot 2021-06-03 at 3.33.24 PM.png

 

Now, Is there anything else that I have to check or configure to remove the older base and delta folders?

Thank you again, Alex.


Super Collaborator

Ok, after you execute COMPACT on your table. Can you also run:

SHOW COMPACTIONS;

To see what state the compaction operation ends up in. 

 

Another place to look is in the HMS logs. Search for your table name and see what compaction events have and have not occurred for your table. Please provide pertinent log lines here.

 

Also check to see if you have this parameter in hive-site.xml:

hive.metastore.housekeeping.threads.on==true

This is responsible for timing out stale transactions on the table. If it's not on, stale transactions are never cleaned up and, as a consequence, Hive does not remove the old delta files. If you make a change to this value, you'll need to restart the stale service.

Super Collaborator

Were you able to have Hive clean up the obsolete files? Just checking in.

; ;