Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

In NiFi queue with Load Balance Strategy RoundRobin flowfiles stuck

avatar
Contributor

Hi All

If in connection(queue between two processors)  I switch on RoundRobin strategy flow files start to stuck there for minutes. 

Some investigation showed, that flow files stuck only on the primary node.

No any Warns or Errors in logs.

I am ready to provide any information

Please, any help.

Thanks

1 ACCEPTED SOLUTION

avatar
Super Mentor

@ilyal 

 

What version of NiFi are you running?  Is it Apache NiFi 1.8.x?

 

There are numerous bugs with the new load balanced connections feature.  The good number of these known bugs have been addressed between NiFi 1.9.0 and NiFi 1.9.2.

https://issues.apache.org/jira/browse/NIFI-5745

https://issues.apache.org/jira/browse/NIFI-5919

https://issues.apache.org/jira/browse/NIFI-5663

https://issues.apache.org/jira/browse/NIFI-5771

https://issues.apache.org/jira/browse/NIFI-6017

 

There are still some additional bugs that are fixed in NiFi 1.10.0

https://issues.apache.org/jira/browse/NIFI-6353

https://issues.apache.org/jira/browse/NIFI-6760

https://issues.apache.org/jira/browse/NIFI-6517

https://issues.apache.org/jira/browse/NIFI-6736

https://issues.apache.org/jira/browse/NIFI-6285

https://issues.apache.org/jira/browse/NIFI-6759

 

I strongly recommend upgrading to Apache NiFi 1.10 as a first step upgrading (releasing soon).

 

Hope this helps,
Matt

View solution in original post

13 REPLIES 13

avatar
Super Mentor

@ilyal 

 

What version of NiFi are you running?  Is it Apache NiFi 1.8.x?

 

There are numerous bugs with the new load balanced connections feature.  The good number of these known bugs have been addressed between NiFi 1.9.0 and NiFi 1.9.2.

https://issues.apache.org/jira/browse/NIFI-5745

https://issues.apache.org/jira/browse/NIFI-5919

https://issues.apache.org/jira/browse/NIFI-5663

https://issues.apache.org/jira/browse/NIFI-5771

https://issues.apache.org/jira/browse/NIFI-6017

 

There are still some additional bugs that are fixed in NiFi 1.10.0

https://issues.apache.org/jira/browse/NIFI-6353

https://issues.apache.org/jira/browse/NIFI-6760

https://issues.apache.org/jira/browse/NIFI-6517

https://issues.apache.org/jira/browse/NIFI-6736

https://issues.apache.org/jira/browse/NIFI-6285

https://issues.apache.org/jira/browse/NIFI-6759

 

I strongly recommend upgrading to Apache NiFi 1.10 as a first step upgrading (releasing soon).

 

Hope this helps,
Matt

avatar
Contributor

Thanks you Matt. I will try and will update Question 

avatar
New Contributor

Upgraded to 1.10 and now I see flow files are stuck with Load Balance strategy Partition By Attribute

avatar
Contributor

Hi, I have issues with other processors, but some processors do not have this issue for example UpdateAttributes. If you add this processor it will be work.

avatar
New Contributor

Issue seems to be in Queues rather than processor. I tried with different processors and issue remains same For eg: below is the queue that points to updateAttribute Processor.

 

Capture1.PNG

avatar
Super Mentor

@venu413 

If you open the NiFi summary UI (NiFi UI --> Global menu --> Summary), select the connections tab, locate this connection with the 54 queued flowfiles, and then click the cluster connection summary icon (Screen Shot 2020-01-07 at 4.47.14 PM.png)to far right, Are all 54 queued FlowFiles on same node?

Is anything being logged in the nifi-app.log on that node were these FlowFiles are queued?

Any observed errors in nifi-app.log during startup if you restart this node?

 

In your nifi.properties file, what values are configured for these properties:
nifi.cluster.load.balance.comms.timeout=
nifi.cluster.load.balance.connections.per.node=
nifi.cluster.load.balance.host=
nifi.cluster.load.balance.max.thread.count=
nifi.cluster.load.balance.port=
nifi.cluster.node.address=

If recommend that both the "nifi.cluster.node.address=" and "nifi.cluster.load.balance.host=" have been configured uniquely per node in your cluster to the resolvable hostname for the given node.  So if you node has a hostname of node1.mycompany.com, then this hostname should be used in both these properties in the NiFi running on that host.  Restart is needed anytime you edit the nifi.properties file.

avatar
New Contributor

@MattWho Here are the values of properties asked

nifi.cluster.load.balance.host=nifi-dev-0.nifi-dev
nifi.cluster.load.balance.port=6342
nifi.cluster.load.balance.connections.per.node=10
nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=300 sec

nifi.cluster.node.address=nifi-dev-0.nifi-dev

 

Still the issue is appearing. 
Actually we have a Kubernetes Cluster using NiFi 1.11.1 image. However, it still appears to happen

avatar
Super Mentor

@JatinSab 

 

Apache NiFi 1.11.1 specifically has a fix https://jira.apache.org/jira/browse/NIFI-7059 which introduced a bug with load-balanced connections.   This bug is addressed in 1.11.2 and is covered in jira https://jira.apache.org/jira/browse/NIFI-7117.

 

Thanks,
Matt

 

avatar
New Contributor

Thanks @MattWho ..
Joe Asked me to build using PR# 4045, and test it...
M on it to verify if it fixed the issue... 
Will let you know... 

Thanks for your help..