Support Questions

Find answers, ask questions, and share your expertise

After a restart, components occasionally appear started but are not processing data

avatar
Explorer

This is an issue I've been observing for quite a while now. I unfortunately don't have any concrete reproduction steps, but from what I can tell, this only seems to happen after rolling restarts of the cluster.

 

Just a few days ago, I performed a rolling restart of one of our NiFi clusters (running NiFi 1.16.3 on Ubuntu 18.04 LTS). Afterwards, a couple of our flows stopped processing data, despite all involved components appearing as started in the UI. The effects of this seem to be under the scope of process groups, as in, if one component within a process group is affected, all components within that group and subgroups are affected.

 

Stopping and then starting the affected components restores them to a functional state. This is of course not a long-term solution, so I'm curious to know if anyone else has encountered this problem before or could share any suggestions regarding troubleshooting or steps towards a solution.

2 REPLIES 2

avatar
Master Mentor

@edaley 

When you NiFi cluster is in this state:
1. How many active threads are indicated in the UI?

MattWho_0-1675366665288.png

2. On which processors are these threads active? (NiFi Summary UI --> processors)

MattWho_1-1675366774519.png

Number in parenthesis is number of active threads on that component.

3. Does either of above stats change or same processors continue to show active threads with no progress of FlowFiles through those processors?
4. Does number of active threads match the size of your thread pool (global menu --> GENERAL --> "Maximum Timer Driven Thread Count)?  This setting is per node, so default is 10 and if you have 3 nodes that could mean you see 30 on the UI as active.
4. Have you use NiFi.sh (./nifi.sh dump <dumpfilename>) to produce several thread dumps to see if any of those threads are changing between dumps (change indicates thread is processing.  No change points at potentially hung thread of a thread waiting on another long running or hung thread.

Hopefully this information will help you troubleshoot and narrow down your issue.

 

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

avatar
Master Mentor

@edaley 
I also recommend upgrading to latest NiFi release to see if this issue persists.  I believe there was a known scheduling on startup bug in Apache NiFi 1.16 that since has been resolved.

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt