Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Schedule Major Compaction Using Cron Job

avatar
Rising Star

Hi All,

As I know major compaction is done by default every 7 days. However I wanted to schedule it on off business hours. For that best solution would be setup a CRON Job.

In order to schedule major compaction using CRON job, do we have to do it just by scheduling the compaction on every table manually with some delay or do we have any other method which can schedule job at a given time on all the tables in HBase ?

1 ACCEPTED SOLUTION

avatar
Super Guru

You can also use a standard cron implementation via Linux. e.g.

echo "major_compact 'FOO'" | hbase shell -n

You could schedule the above to run on a specific node at your off-peak time. Be sure to monitor the output so that you can react to any possible failures.

View solution in original post

4 REPLIES 4

avatar

you can write a job and schedule it from oozie/azkaban.

avatar
Super Guru

You can also use a standard cron implementation via Linux. e.g.

echo "major_compact 'FOO'" | hbase shell -n

You could schedule the above to run on a specific node at your off-peak time. Be sure to monitor the output so that you can react to any possible failures.

avatar
Rising Star

Hi Josh, thank you for the inputs. I came across one more method where we can do compaction in off peak hours using hbase.offpeak.start.hour . However from this parameter I understand that it will do major compaction everyday.

So is there anyway I can use hbase.offpeak.start.hour parameter and schedule major compaction for all the tables once in a week?

avatar
New Contributor

Yes it's correct what   elserj said, but inside your crontab job please add

. $HOME/.bashrc;

for example:

09 15 * * 1 . $HOME/.bashrc; PATH:/compact.sh > /home/user/logfile.log 2>&1