Support Questions

Find answers, ask questions, and share your expertise

Can I have an oozie coordinator that runs once per hour trigger a particular action only once per day?

avatar
Rising Star

For example, I have a coordinator with 2 actions, A and B. The coordinator runs hourly. Once per day I want to run action A followed by action B. The other 23 times of the day I just want to run action B.

Is there an elegant way to orchestrate that with a single oozie coordinator?

1 ACCEPTED SOLUTION

avatar

Here is how I would do it but I am missing any requirement, please feel free to add more details (without revealing any secret sauce of your logic 😉 -

Assumptions first -

  • Job 1 - Executes Step A and Step B at 00:01 AM every morning (two step job)
  • Job 2 - Executes Step B every hour between 01:01 - 23:01 through out the day. (single step job)

Note: The timings can obviously be adjusted but assumption here is that the time of execution of two step is fixed and is mutually exclusive with the other 23 executions of the single step job. These two steps could be any action supported by Oozie like Hive, Pig, Email, SSH etc. So the workflow definitions will have duplicate Step B action in both jobs.

Coordinator Definitions - The exact time of execution and frequency can be controlled by specifying the values of validity and frequency.

For Job1,

  • Validity = 00:00 hours of the day when you want the job to start executing.
  • Frequency = ${coord:days(int n)}

See section 4.4.1. The coord:days(int n) and coord:endOfDays(int n) EL functionsat - http://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html

For Job 2,

  • Validity = 01:00 hours of the same day as Job 1
  • Frequency = frequency="* 1-23 * * *"
  • Note: that instead of using fixed frequency we are using cron type syntax, which is super cool

See section 4.4.3. Cron syntax in coordinator frequency at - http://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html

Hope this helps.

View solution in original post

8 REPLIES 8

avatar

There is no clean way to do this within the same oozie job.

If the time, when Step A and B have to be executed together, if fixed then IMHO it would be a better approach to set up two different oozie jobs - 1 with both steps that runs once a day and the other one with Step B only that runs 23 times.

avatar
Rising Star

Thanks, @bsaini. I'd like to job to execute once an hour every hour of the day. If I set up two different jobs, how could I specify that one of the jobs (running Step B only) skip one its runs?

avatar

My response to your comment was longer than whats allowed for comments so adding as new answer.

avatar

Moved the question to "Governance and Lifecycle" track.

avatar

Here is how I would do it but I am missing any requirement, please feel free to add more details (without revealing any secret sauce of your logic 😉 -

Assumptions first -

  • Job 1 - Executes Step A and Step B at 00:01 AM every morning (two step job)
  • Job 2 - Executes Step B every hour between 01:01 - 23:01 through out the day. (single step job)

Note: The timings can obviously be adjusted but assumption here is that the time of execution of two step is fixed and is mutually exclusive with the other 23 executions of the single step job. These two steps could be any action supported by Oozie like Hive, Pig, Email, SSH etc. So the workflow definitions will have duplicate Step B action in both jobs.

Coordinator Definitions - The exact time of execution and frequency can be controlled by specifying the values of validity and frequency.

For Job1,

  • Validity = 00:00 hours of the day when you want the job to start executing.
  • Frequency = ${coord:days(int n)}

See section 4.4.1. The coord:days(int n) and coord:endOfDays(int n) EL functionsat - http://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html

For Job 2,

  • Validity = 01:00 hours of the same day as Job 1
  • Frequency = frequency="* 1-23 * * *"
  • Note: that instead of using fixed frequency we are using cron type syntax, which is super cool

See section 4.4.3. Cron syntax in coordinator frequency at - http://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html

Hope this helps.

avatar
Rising Star

Terrific advice! I'll let you know how it goes in a couple of weeks.

avatar
Master Mentor

@bsaini This is very nice! Wikify 🙂

avatar
Master Guru

I implemented something similar to that, I wanted to run a data load every hour but load a dimension table from a database every 12 hours. I couldn't use two coordinators since the load would fail if the dimension table is loaded at the same time. So doing it in the same workflow was better.

Instead of having a coordinator that starts two workflows I have a parameter in the coordinator that is given to the workflow which contains the hour like this:

2015121503 ( 2015-12-15-03 )

<property>
     <name>hour</name>
     <value>${coord:formatTime(coord:nominalTime(), 'yyyyMMddHH')}</value>
</property>

I then use a decision node in the workflow to only do the sqoop action every 12 hours ( in this case ) and do the load alone in all other cases . The sqoop action obviously continues with the load action.

<start to="decision"/>
    
    <decision name="decision">
        <switch>
            <case to="sqoop">
              ${( hour % 100) % 12 == 0}
            </case>
            <default to="load"/>
        </switch>
    </decision>
    
</decision>