Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Oozie scheduling question

Oozie scheduling question

New Contributor

Hi,

 

I am currently using cron and my driver code to merge sequence files for the previous hour. So for example, if I have 6 files, and 3 of them are below the block size, I will merge small files together to try and get close to the box size. Cron is currently kicking off at 10 past the hour and feeding in the previous hour, e.g. /data//2014/01/14/09 if the current hour is 10. It will merge these files and replace the small files with these merged files.

 

I have been reading the oozie documentation, and I am having a hard time expressing this. Do I need to create a dataset for the previous hour? And the output dataset is the same as the input.

 

Any pointers much appreciated.

 

Thanks

 

Dave

1 REPLY 1

Re: Oozie scheduling question

Master Guru
Hue has got a CRON-style scheduler for Oozie, which should help you out. Check out the video at http://gethue.tumblr.com/post/78593185931/hadoop-tutorial-schedule-your-hadoop-jobs-intuitively
Don't have an account?
Coming from Hortonworks? Activate your account here