Created 08-12-2025 02:54 AM
Hey, communiy!
We have a NiFi Cluster 1.18.0 (still so, yeap, sorry) and next issue do my mind.
Simple flow, where we read data from Kafka with ConsumeKafka Processor and process it with EvaluateJsonPath after.
One time we can see that queue between processors are stuck with data and nothing happens while EvaluateJsonPath or whole canvas not restared (right click -> stop -> start):
Connection settings:
Queue threshold exceeded only on 3rd cluster node:
Additional configuration data:
So, as soon as I stop and start canvas -- all works fine again. Why it happens? How can I find the reason?
Created 08-12-2025 05:29 AM
@asand3r
Here are my observations from what you have shared:
Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 08-18-2025 06:29 AM
Thanks, @MattWho for your points to LoadBalance. The 3rd node really had network connection issues that time, so maybe it takes place. For now it works fine, so I cannot do test steps that you offer.
But I don't fully get your point about LB after ConsumeKafka.
If Load balance is enabled is queue between ConsumeKafka and EvaluateJsonPath I can see that data is distributes along all cluster nodes in Data Provenance (look at screenshot below) , but if I disable it, only one node is presents here:
My configuration with RoundRobin LB is wrong?
Created 08-26-2025 05:45 AM
The question is how many partitions does he target Kafka topic have?
If it only has 1 partition, then only one node in the consumeKafka consumer group is going to consume all the messages. Since you are saying that when LB on connection is disabled and queue shows all FlowFiles on one node, that tells me you have just one partition.
For optimal throughput you would want some multiple of the number of nodes as the partition count.
With 3 NiFi nodes, you would want 3, 6, 9, etc partitions. With 3 partitions and 1 concurrent task set on your consumeKafka, you will have 3 consumers in the consumer group. Each node will consume from one of those partitions. If you have 6 partitions, and 2 concurrent tasks set on the consumeKafka processor, you will have 6 consumers (3 nodes x 2 concurrent tasks) in your consumer group. So each node's consumeKafka will be able to concurrently pull from 2 partitions.
So while your LB allows for redistribution of FlowFiles being consumed by one node, it is not the optimal setup here. LB connection will not change how the processor functions, it only operates against the FlowFiles output from the feeding processor. LB connection setting are most commonly used on downstream connection of processors that are typically scheduled on primary node only (listFile, ListSFTP, etc).
Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt