I'm using QueryDatabaseTable on a 3-node HDF/Nifi cluster. What's happening is that once the process starts it simultaneously fetches three flowfiles containing three identical copies of the records, therefore causing three duplicates of each record being fetched. I'm suspecting that each node fetches it's own records at the same time without coordination between the nodes.
To test if this is the case I changed the configuration of the processor on the SCHEDULING tab, by changing the Execution value from "All nodes" to "Primary node". After applying this change the issue was resolved and only one copy of each record is fetched.
Is this a bug in Nifi or is this a normal behaviour? what If I need all nodes to participate in fetching records from the database and not overload the primary node?
That's by design, you want to run this on primary node and distribute load further down the line. Here's an article describing a similar approach https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html
It is not make sense, QueryDatabaseTable has state in cluster scope. I think this is make it possible to get data parallel.
Or state exist for something else?