Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Schedule Major Compaction Using Cron Job

Hi All,

As I know major compaction is done by default every 7 days. However I wanted to schedule it on off business hours. For that best solution would be setup a CRON Job.

In order to schedule major compaction using CRON job, do we have to do it just by scheduling the compaction on every table manually with some delay or do we have any other method which can schedule job at a given time on all the tables in HBase ?

1 ACCEPTED SOLUTION

You can also use a standard cron implementation via Linux. e.g.

echo "major_compact 'FOO'" | hbase shell -n

You could schedule the above to run on a specific node at your off-peak time. Be sure to monitor the output so that you can react to any possible failures.

View solution in original post

3 REPLIES 3

you can write a job and schedule it from oozie/azkaban.

You can also use a standard cron implementation via Linux. e.g.

echo "major_compact 'FOO'" | hbase shell -n

You could schedule the above to run on a specific node at your off-peak time. Be sure to monitor the output so that you can react to any possible failures.

Hi Josh, thank you for the inputs. I came across one more method where we can do compaction in off peak hours using hbase.offpeak.start.hour . However from this parameter I understand that it will do major compaction everyday.

So is there anyway I can use hbase.offpeak.start.hour parameter and schedule major compaction for all the tables once in a week?