Support Questions

Find answers, ask questions, and share your expertise

Oozie not cleaning up old jobs from Oozie database

avatar

Hi,

I have the following properties set in my oozie-site.xml ( Using safety-valve in Cloudera Manager )

 

oozie.services.ext - org.apache.oozie.service.PurgeService

oozie.service.PurgeService.older.than - 15

oozie.service.PurgeService.coord.older.than - 7

oozie.service.PurgeService.bundle.older.than - 7

oozie.service.PurgeService.purge.interval - 60

 

However, I still see some old jobs which are KILLED or completed as old as September 2014

 

To give an example,

I have a Coordinator which is currently in RUNNING state. When I use the Oozie Web Console to list the instances of that Co-ordinator i.e. Click on Co-ordinator tab
and click on my co-ordinator and in that pop up I can see the oldest job of all materialised workflow jobs (co-ordinator actions) is of September 2014. I assume the property
responsible for cleaning this up is oozie.service.PurgeService.older.than which I have set to 15 days. So what am I missing here?

1 ACCEPTED SOLUTION

avatar
Rising Star

What you can do as a workaround, is split up your long-running Coordinators.  For example, instead of making your Coordinator run for years? forever?, make it run for, say, 6 months.  And have an identical Coordinator scheduled to start exactly when that one ends.  This will allow Oozie to cleanup the old child Workflows from that Coordinator every 6 months.  

Otherwise, you can schedule a cron job to manually delete old jobs from the Database.  However, please be careful about this.  When deleting a workflow job from the WF_JOBS table, you'll also need to delete the workflow actions from the WF_ACTIONS table that belong to it, as well as the coordinator action from the WF_ACTIONS table that it belongs to.  If you miss something, it will likely cause problems.  
Software Engineer | Cloudera, Inc. | http://cloudera.com

View solution in original post

5 REPLIES 5

avatar
Rising Star

Hi,

 

By default, Oozie will not purge child jobs if the parent is not eligible to be purged. In your case, because the Coordinator job is still running, none of the child Workflow jobs will be purged.

 

Which version of CDH are you using? Starting with CDH 5.2.0, you can change it so that Oozie will delete the child jobs even if the parent job is still running. To do that, you can set oozie.service.PurgeService.purge.old.coord.action=true in oozie-site.

 

Also, starting with CM 5.4, the Oozie Configuration page has controls for these configs, so you don't need the safety-valve anymore here.

Software Engineer | Cloudera, Inc. | http://cloudera.com

avatar

Thanks 🙂 rkanter.
We are using 
CDH 4.5.0 alongwith Cloudera Manager 5.2.0. Since, it is not possible to upgrade currently I guess I will have to resort to manually cleaning up the database tables. The database size is too large and we are observing latency in the queries (when observed via show processlist in MySQL). Are there any alternatives to CDH/CM upgrade or manual purging for maintaining the evergrowing database size ?

avatar
Rising Star

What you can do as a workaround, is split up your long-running Coordinators.  For example, instead of making your Coordinator run for years? forever?, make it run for, say, 6 months.  And have an identical Coordinator scheduled to start exactly when that one ends.  This will allow Oozie to cleanup the old child Workflows from that Coordinator every 6 months.  

Otherwise, you can schedule a cron job to manually delete old jobs from the Database.  However, please be careful about this.  When deleting a workflow job from the WF_JOBS table, you'll also need to delete the workflow actions from the WF_ACTIONS table that belong to it, as well as the coordinator action from the WF_ACTIONS table that it belongs to.  If you miss something, it will likely cause problems.  
Software Engineer | Cloudera, Inc. | http://cloudera.com

avatar

Thanks Robert !

avatar
New Contributor

Hi @Robert K

My postgresql db still stuck with some old bundle with status KILLED at the old time (year 2016).

I can see some log like 'STARTED Purge to purge Workflow Jobs older than [30] days, Coordinator Jobs older than [7] days, and Bundle jobs older than [7] days' but never see log like 'ENDED Purge deleted ...'

How can I check that (with oozie 4.2)?

Thank you very much.Screen Shot 2017-10-27 at 9.27.54 AM.png