Created 02-01-2023 01:10 AM
This is an issue I've been observing for quite a while now. I unfortunately don't have any concrete reproduction steps, but from what I can tell, this only seems to happen after rolling restarts of the cluster.
Just a few days ago, I performed a rolling restart of one of our NiFi clusters (running NiFi 1.16.3 on Ubuntu 18.04 LTS). Afterwards, a couple of our flows stopped processing data, despite all involved components appearing as started in the UI. The effects of this seem to be under the scope of process groups, as in, if one component within a process group is affected, all components within that group and subgroups are affected.
Stopping and then starting the affected components restores them to a functional state. This is of course not a long-term solution, so I'm curious to know if anyone else has encountered this problem before or could share any suggestions regarding troubleshooting or steps towards a solution.
Created 02-02-2023 11:48 AM
@edaley
When you NiFi cluster is in this state:
1. How many active threads are indicated in the UI?
2. On which processors are these threads active? (NiFi Summary UI --> processors)
Number in parenthesis is number of active threads on that component.
3. Does either of above stats change or same processors continue to show active threads with no progress of FlowFiles through those processors?
4. Does number of active threads match the size of your thread pool (global menu --> GENERAL --> "Maximum Timer Driven Thread Count)? This setting is per node, so default is 10 and if you have 3 nodes that could mean you see 30 on the UI as active.
4. Have you use NiFi.sh (./nifi.sh dump <dumpfilename>) to produce several thread dumps to see if any of those threads are changing between dumps (change indicates thread is processing. No change points at potentially hung thread of a thread waiting on another long running or hung thread.
Hopefully this information will help you troubleshoot and narrow down your issue.
If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.
Thank you,
Matt
Created 02-02-2023 11:55 AM
@edaley
I also recommend upgrading to latest NiFi release to see if this issue persists. I believe there was a known scheduling on startup bug in Apache NiFi 1.16 that since has been resolved.
If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.
Thank you,
Matt