Created 12-03-2015 07:55 PM
For example, I have a coordinator with 2 actions, A and B. The coordinator runs hourly. Once per day I want to run action A followed by action B. The other 23 times of the day I just want to run action B.
Is there an elegant way to orchestrate that with a single oozie coordinator?
Created 12-04-2015 01:09 AM
Here is how I would do it but I am missing any requirement, please feel free to add more details (without revealing any secret sauce of your logic 😉 -
Assumptions first -
Note: The timings can obviously be adjusted but assumption here is that the time of execution of two step is fixed and is mutually exclusive with the other 23 executions of the single step job. These two steps could be any action supported by Oozie like Hive, Pig, Email, SSH etc. So the workflow definitions will have duplicate Step B action in both jobs.
Coordinator Definitions - The exact time of execution and frequency can be controlled by specifying the values of validity and frequency.
For Job1,
See section 4.4.1. The coord:days(int n) and coord:endOfDays(int n) EL functionsat - http://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html
For Job 2,
See section 4.4.3. Cron syntax in coordinator frequency at - http://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html
Hope this helps.
Created 12-03-2015 08:10 PM
There is no clean way to do this within the same oozie job.
If the time, when Step A and B have to be executed together, if fixed then IMHO it would be a better approach to set up two different oozie jobs - 1 with both steps that runs once a day and the other one with Step B only that runs 23 times.
Created 12-03-2015 10:48 PM
Thanks, @bsaini. I'd like to job to execute once an hour every hour of the day. If I set up two different jobs, how could I specify that one of the jobs (running Step B only) skip one its runs?
Created 12-04-2015 01:09 AM
My response to your comment was longer than whats allowed for comments so adding as new answer.
Created 12-03-2015 08:57 PM
Moved the question to "Governance and Lifecycle" track.
Created 12-04-2015 01:09 AM
Here is how I would do it but I am missing any requirement, please feel free to add more details (without revealing any secret sauce of your logic 😉 -
Assumptions first -
Note: The timings can obviously be adjusted but assumption here is that the time of execution of two step is fixed and is mutually exclusive with the other 23 executions of the single step job. These two steps could be any action supported by Oozie like Hive, Pig, Email, SSH etc. So the workflow definitions will have duplicate Step B action in both jobs.
Coordinator Definitions - The exact time of execution and frequency can be controlled by specifying the values of validity and frequency.
For Job1,
See section 4.4.1. The coord:days(int n) and coord:endOfDays(int n) EL functionsat - http://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html
For Job 2,
See section 4.4.3. Cron syntax in coordinator frequency at - http://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html
Hope this helps.
Created 12-04-2015 02:16 PM
Terrific advice! I'll let you know how it goes in a couple of weeks.
Created 12-04-2015 02:21 PM
@bsaini This is very nice! Wikify 🙂
Created 12-17-2015 09:27 AM
I implemented something similar to that, I wanted to run a data load every hour but load a dimension table from a database every 12 hours. I couldn't use two coordinators since the load would fail if the dimension table is loaded at the same time. So doing it in the same workflow was better.
Instead of having a coordinator that starts two workflows I have a parameter in the coordinator that is given to the workflow which contains the hour like this:
2015121503 ( 2015-12-15-03 )
<property> <name>hour</name> <value>${coord:formatTime(coord:nominalTime(), 'yyyyMMddHH')}</value> </property>
I then use a decision node in the workflow to only do the sqoop action every 12 hours ( in this case ) and do the load alone in all other cases . The sqoop action obviously continues with the load action.
<start to="decision"/> <decision name="decision"> <switch> <case to="sqoop"> ${( hour % 100) % 12 == 0} </case> <default to="load"/> </switch> </decision> </decision>