I'm seeing processors with very little input generate a tsunami of tasks (thousands within a couple of seconds) when Run Schedule is set to 0ms (run duration to 0 also). I have the understanding that 0ms in Run Schedule should be interpreted as "always on" / "continuously", like a HTTP request handler or similar listener, always ready for handling requests individually and immediately when received.
I have a case with an UpdateAttribute processor, also with Run Schedule to 0ms. It generates no tasks when no incoming flowfiles, and only generates as few as incoming flowfiles require (this case, very few, say 10 for 10 incoming test flow files). But if the FlowFiles are penalized (this instance a full day, 1d, coming from PutHDFS) then the task generation goes crazy (like thousands per seconds), without actually doing any work (no flow files move through). Why the high number of tasks generations? Seems to affect the cluster.
We notice similar tsunami task generation on other processors, like HTTP request handler.
It's unexpected behaviour to me. Doesn't seem as a robust reaction. Is it intended/expected outcome or a bug?
Her are a couple of screenshots, showing the case. Look at the UpdateAttribute processor in the middle. Input to it is stopped. 12 penalized flowfiles are waiting in input queue.
Next, UpdateAttribute started, and within a couple of seconds thousands of tasks generated. Flowfiles are penalized for a full day in this case, so don't flow through, but the task generation goes crazy while waiting for the penalized flowfiles to be released. Is this really intended behaviour?
If only non-penalized flowfiles are input, then only the needed tasks are generated, and no runaway waisted tasks are made.
PS. Yield duration for UpdateAttribue is default 1s and penalty 30s.