Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Master Guru

# Processor Run Duration:

Some processors support configuring a run duration. This setting tells a processor to continue to use the same task to work on as many FlowFiles (or batches of flowfiles) from an incoming queue in a single task. This is ideal for processors where the individual tasks themselves are completed very fast and the volume of FlowFile are large as well.

107591-screen-shot-2019-04-01-at-35055-pm.png


In the above example, the exact same feed of FlowFiles were passed to both these processors which are configured to perform the same Attribute updated.

Both processed the same number of FlowFiles in the past 5 minutes; however, the processor configured with a run duration consumed less overall CPU time to do so.

Not all processors support setting a run duration. The nature of the processor function, the methods being used, and/or client lib used may not support this capability. You will not be able to set a run duration on such processors.

How this works:

  • Processor has thread assigned to its task. Processor grabs highest priority FlowFile (or batch of FlowFiles) from the “active queue” of the incoming connection. If processing of the FlowFile(s) does not exceed the configured run duration, another FlowFile (Flowfile batch) is pulled from the active queue. This process continues all under that same thread until run duration has been reached or “Active queue” is empty. At that time the session is completed and all outbound FlowFiles are committed at once to the appropriate relationship.
  • Since no FlowFiles are committed until the entire run completes, Some latency is introduced on the FlowFiles. Your configured run duration dictates how much latency will occur at a minimum.
  • If the execution of the processor against a FlowFile takes longer then the configured "run duration", there is no added benefit of adjusting this configuration.

What this means for heap usage:

  • Since it is only processing incoming FlowFiles in the “Active queue” there is no added heap pressure here. (FlowFiles in “active queue “ are already in heap space).
  • The FlowFiles being generated (if any, depending on processor function) are all held in heap until the final commit. This may introduce some additional heap pressure versus not using a run duration since all those new FlowFiles being generated will be held in heap until they are all commited to an output relatiosnhip at the end of the run duration.
2,098 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 06:27 AM
Updated by:
 
Contributors
Top Kudoed Authors