Support Questions

Find answers, ask questions, and share your expertise

Nifi was suspended for a few minutes, then resumed its operation

avatar
New Contributor

Hello everyone,

I apologize for my limited English proficiency. I am currently using Nifi version 1.14.

My issue is that Nifi frequently stops, and the processors become unresponsive and inactive. However, the UI operations continue to work normally. After a few minutes, or sometimes after I restart, these flows start working again. I have checked the logs, but I haven't found any errors. I have configured the Maximum Timer Driven Thread Count to be 32 (as my system has only 8 cores). Unfortunately, due to certain constraints, I am unable to upgrade to a newer version at the moment.

I would greatly appreciate any suggestions or assistance you could provide. Your help would mean a lot to me. Thank you in advance for your response.

Best regards, Screenshot 2023-08-14 at 17.54.05.png

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Tenda 
Since you are saying you can freely navigate the NiFi UI when in this "stuck" state, NiFi is not stuck as both the UI and processor components all operate within the same JVM.  What you circled indicates that at the exact moment (last time browser refreshed) there were 24 active threads out of the 32 configured in the Max Timer Driven Thread pool settings.  Milliseconds later that could still be 24 active threads but consumed by different components.  The NiFi processors will all show small a small number in the upper right corner if they have an active threads, so step one is determining which processors are holding these 24 threads for a long time.  Then looking at those processors and the thread dumps to figure out why those threads are long running.  Typically we would see this when external service connections are made which are unstable, network issues, local NiFi repo I/O, NiFi CPU utilization, or long or very frequent GC pauses, or even OOMs.  So you have ruled out a few of these so far it sounds.

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

View solution in original post

4 REPLIES 4

avatar
Community Manager

Welcome to the community @Tenda. While you wait for a more knowledgable community member to respond, allow me to suggest reading this post in case it helps get you closer. 


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Master Mentor

@Tenda 

What processor become unresponsive?
You mean that the processor indicates that it is currently executing a thread (small number shown in upper right corner); however, all the stats on the processor for in, out, tasks show 0 for last 5 minutes?

If tasks show numbers updating, then tasks/threads are executing and completing.
If tasks is showing 0 for last 5 minutes or very low for last 5 minutes and you see an active thread number in upper right corner of processor, it may be caused by a few reasons:

  1. Your CPU load average is how due to cpu intensive processors executing at same time. (would expect lag in UI if CPU was saturated)
  2. You have a processors configured with too many concurrent tasks leading to other processors not getting allocated a thread often enough. (If core load average is consistently low, you could increase the size of your max timer driven thread pool higher than 32.
  3. Java heap garbage collection (GC).  GC happens when your JVM heap usage reaches ~80% utilization. If your heap is too small, you could be experiencing lots of back to back GC.  All GC whether partial or full GC are stop-the-world events weaning JVM will do nothing while GC is happening.  If you heap is set to large, the GC stop-the-world may take much longer to complete.
  4. You have processors that have long running tasks or hung threads consuming threads from your available max timer driven thread pool thus limiting available threads for other components.  Only the examination of a series of multiple NiFi JVM thread dumps collected minutes apart will tell you if you have a long running task (thread dump shows change to thread indicating slow progress being made) or potentially hung thread (thread dumps all show same consistent output for thread.  When you have a processor that is in this state and "terminate" the thread on the processor, does the terminated thread (shown as small number with parenthesis "(1)") ever go away?  if not, that terminated thread never completed.  While "terminate" release FlowFile associated to that thread back to inbound connection queue and give user back full control of the processor.  The only way to "kill" a truly hung thread is by restarting the entire NiFi JVM.  Which you said you do once in awhile.

Hope you find this information helps you drill deeper in to your issue and identify what is impacting you.

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

avatar
New Contributor

All processing units are hanging, and the highlighted circle indicates that Nifi will remain stuck in that state permanently (if fortunate, it might resume operation after a few minutes, but then continue to hang again). As you suggested, I will review the GC configuration. Another crucial piece of information is that the issue I am encountering occurs after I perform Kafka consumption: in this pipeline, I employ record-level transformations using JoltTransformJSON, UpdateRecord, ConvertJSONToSQL. The volume of records is quite substantial, yet monitoring CPU and Heap reveals that Nifi is only utilizing around 26% CPU and 63% RAM. Nonetheless, I am sincerely grateful for your assistance.

avatar
Master Mentor

@Tenda 
Since you are saying you can freely navigate the NiFi UI when in this "stuck" state, NiFi is not stuck as both the UI and processor components all operate within the same JVM.  What you circled indicates that at the exact moment (last time browser refreshed) there were 24 active threads out of the 32 configured in the Max Timer Driven Thread pool settings.  Milliseconds later that could still be 24 active threads but consumed by different components.  The NiFi processors will all show small a small number in the upper right corner if they have an active threads, so step one is determining which processors are holding these 24 threads for a long time.  Then looking at those processors and the thread dumps to figure out why those threads are long running.  Typically we would see this when external service connections are made which are unstable, network issues, local NiFi repo I/O, NiFi CPU utilization, or long or very frequent GC pauses, or even OOMs.  So you have ruled out a few of these so far it sounds.

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt