Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NiFi Run Schedule API

avatar
Explorer

I have a question on the Run Schedule item under the Scheduling section of a processor. What are the trigger words for it?

For instance, if I want to run a processor once a day, I type 24 hour. Does 1 Day work as well? I looked at the documentation, but there doesn't seem to be a list of words that would work for when you want them to trigger.

1 ACCEPTED SOLUTION

avatar
Super Mentor

@Michael Rivera

NiFi is designed to accept many triggers words sec, second, secs, seconds, min, minutes, mins, hr, hrs , day, days, etc....

The max it will accept as of NiFi 1.x is week or weeks (or wk or wks).

If you enter an invalid trigger word, the processor will let you know it is invalid. Such as trying to use month or year will produce the below:

11736-screen-shot-2017-01-25-at-115140-am.png

Keep in mind that by using the "timer driven" scheduling strategy you are not setting a specific execution time. You are setting an execution interval where the first interval is scheduled upon start of the processor. The second execution will occur x amount of configured "run schedule" later. If you stop and then start the processor again, the interval starts over.

The "CRON Driven" scheduling strategy allow you to configure an exact time(s) for execution.

Thanks,

Matt

View solution in original post

4 REPLIES 4

avatar
Master Guru

You can execute a processor on a schedule via crontab

more info here

https://docs.hortonworks.com/HDPDocuments/HDF1/HDF-1.2/bk_UserGuide/content/scheduling-tab.html

  • CRON driven: When using the CRON driven scheduling mode, the Processor is scheduled to run periodically, similar to the Timer driven scheduling mode. However, the CRON driven mode provides significantly more flexibility at the expense of increasing the complexity of the configuration. This value is made up of six fields, each separated by a space. These fields include:
    1. Seconds
    2. Minutes
    3. Hours
    4. Day of Month
    5. Month
    6. Day of Week
    7. Year

    The value for each of these fields should be a number, range, or increment. Range here refers to a syntax of <number>-<number>. For example,the Seconds field could be set to 0-30, meaning that the Processor should only be scheduled if the time is 0 to 30 seconds after the minute. Additionally, a value of * indicates that all values are valid for this field. Multiple values can also be entered using a , as a separator: 0,5,10,15,30. An increment is written as <start value>/<increment>. For example, settings a value of 0/10 for the seconds fields means that valid values are 0, 10, 20, 30, 40, and 50. However, if we change this to 5/10, valid values become 5, 15, 25, 35, 45, and 55.

    For the Month field, valid values are 1 (January) through 12 (December).

    For the Day of Week field, valid values are 1 (Sunday) through 7 (Saturday). Additionally, a value of L may be appended to one of these values to indicate the last occurrence of this day in the month. For example, 1L can be used to indicate the last Monday of the month.

Next, the Scheduling Tab provides a configuration option named 'Concurrent tasks.' This controls how many threads the Processor will use. Said a different way, this controls how many FlowFiles should be processed by this Processor at the same time. Increasing this value will typically allow the Processor to handle more data in the same amount of time. However, it does this by using system resources that then are not usable by other Processors. This essentially provides a relative weighting of Processors - it controls how much of the system's resources should be allocated to this Processor instead of other Processors. This field is available for most Processors. There are, however, some types of Processors that can only be scheduled with a single Concurrent task.

The "Run schedule" dictates how often the Processor should be scheduled to run. The valid values for this field depend on the selected Scheduling Strategy (see above). If using the Event driven Scheduling Strategy, this field is not available. When using the Timer driven Scheduling Strategy, this value is a time duration specified by a number followed by a time unit. For example, 1 secondor 5 mins. The default value of 0 sec means that the Processor should run as often as possible as long as it has data to process. This is true for any time duration of 0, regardless of the time unit (i.e., 0 sec, 0 mins, 0 days). For an explanation of values that are applicable for the CRON driven Scheduling Strategy, see the description of the CRON driven Scheduling Strategy itself.

The right-hand side of the tab contains a slider for choosing the 'Run duration.' This controls how long the Processor should be scheduled to run each time that it is triggered. On the left-hand side of the slider, it is marked 'Lower latency' while the right-hand side is marked 'Higher throughput.' When a Processor finishes running, it must update the repository in order to transfer the FlowFiles to the next Connection. Updating the repository is expensive, so the more work that can be done at once before updating the repository, the more work the Processor can handle (Higher throughput). However, this means that the next Processor cannot start processing those FlowFiles until the previous Process updates this repository. As a result, the latency will be longer (the time required to process the FlowFile from beginning to end will be longer). As a result, the slider provides a spectrum from which the DFM can choose to favor Lower Latency or Higher Throughput.

avatar
Super Mentor

@Michael Rivera

NiFi is designed to accept many triggers words sec, second, secs, seconds, min, minutes, mins, hr, hrs , day, days, etc....

The max it will accept as of NiFi 1.x is week or weeks (or wk or wks).

If you enter an invalid trigger word, the processor will let you know it is invalid. Such as trying to use month or year will produce the below:

11736-screen-shot-2017-01-25-at-115140-am.png

Keep in mind that by using the "timer driven" scheduling strategy you are not setting a specific execution time. You are setting an execution interval where the first interval is scheduled upon start of the processor. The second execution will occur x amount of configured "run schedule" later. If you stop and then start the processor again, the interval starts over.

The "CRON Driven" scheduling strategy allow you to configure an exact time(s) for execution.

Thanks,

Matt

avatar
Explorer

Perfect! Thanks!

avatar
Master Guru

@Michael Rivera If this has answered your question, please close out by accepting answer. thank you.