Reply
New Contributor
Posts: 7
Registered: ‎09-19-2013

Oozie scheduling question

[ Edited ]

Hi,

 

I am currently using cron and my driver code to merge sequence files for the previous hour. So for example, if I have 6 files, and 3 of them are below the block size, I will merge small files together to try and get close to the box size. Cron is currently kicking off at 10 past the hour and feeding in the previous hour, e.g. /data//2014/01/14/09 if the current hour is 10. It will merge these files and replace the small files with these merged files.

 

I have been reading the oozie documentation, and I am having a hard time expressing this. Do I need to create a dataset for the previous hour? And the output dataset is the same as the input.

 

Any pointers much appreciated.

 

Thanks

 

Dave

Posts: 1,826
Kudos: 406
Solutions: 292
Registered: ‎07-31-2013

Re: Oozie scheduling question

Hue has got a CRON-style scheduler for Oozie, which should help you out. Check out the video at http://gethue.tumblr.com/post/78593185931/hadoop-tutorial-schedule-your-hadoop-jobs-intuitively
Announcements