Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NiFi Flowfiles Stucked in Round Robin Queues

avatar
New Contributor

Hello NiFi Community!

 

I have a three-node NiFi cluster. I use this to ingest data from multiple source systems. From time to time, we experience this issue where the flowfiles are getting stucked in round robin queues and just sit idly there. 

 

databoi_0-1681781564526.png

 

I've also tried adding funnels just to test if they can still proceed downstream.

databoi_1-1681781602622.png

 

Upon monitoring the disk usage of the cluster, it never exceeds 30% utilization.

 

Anyone knows what's causing this issue? Tried checking the nifi-app logs but no luck.

 

Thank you!

 

 

4 REPLIES 4

avatar
New Contributor

Also note that all of the stucked flowfiles are in the same node. Restarting the node typically solves the issue but I want to prevent this issue from happening again.

avatar
Community Manager

@databoi, Welcome to our community! To help you get the best possible answer, I have tagged in our NiFi experts @cotopaul @SAMSAL @MattWho  who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar

hi @databoi,


It would help if you could also provide your NiFi Version, as each version has it s own perks and twitches.


What you have experienced so far can have plenty of root causes and it is not quite easy to debug 😞 I assume that this happens only a single node, all the time, right? Something similar happened to me as well and it was not easy to fix ... or at least it was not for me.

 

My problem was mostly related to how I configured the NiFi Cluster. I have been told that there are some best practices when it comes to configuring NiFi, especially on a bare metal machine:

  • the repositories (content, flowfile and provenence) should be stored on separate drives, with a high I/O as these repositories are mostly used by NiFi to persist data. (nifi.properties)
  • assign no more than 40% of your node's RAM memory to your heap configs. (bootstrap.conf)
  • make sure that your open files and max user processes are set to a higher value than normal.
  • setting the correct number of threads. (2 - 4 times the number of cores of your server)

 

There were three problems on my side and the solution was as follows:

- I moved the repositories on a different drive (a SSD) with a high I/O, so it could read and write the content faster.

- I increased the open files and the max user processes to 50000 and 10000. And I will increase it again in a couple of days.

- And my third problem was related to the disk hardware, as it was dying, it started to malfunction causing this stop-the-world delays. I replaced it and everything went back to normal.

 

You should also pay attention to the JVM memory of that particular node. In addition, you could activate the debug mode and even generate some dumps to further analyze (./nifi.sh dump > <name of your dump file>). Another point you could check are the processes on your affected node. Maybe something is causing NiFi to become an zombie process (or your have some zombie processes) which are affecting your overall performance.

I do hope that something from this message might lead you to your root cause. In any case, I strongly recommend you to take into consideration other opinions as well, from other community members, with far more experience than myself.

avatar
Super Mentor

@databoi 

I see from your images that you are using Apache NiFi 1.11.4 which is around the time that the Load Balanced connection capability was introduced.  There were many bugs subsequently identified in load balanced connection and addressed in future releases. I strongly encourage you to upgrade to the latest NiFi release and see if your issue persists. 

 

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt