I have a clustered nifi setup and we are running GetMongo processor with the Primary mode on, so that duplicate data is not fetched. This seems to be working fine. However once I have this data I want the following processes in the chain to run on a cluster, as in parallel processing to be done on this data which has been fetched. Somehow this is not happening. So my question is below assuming GetMongo has fetched 30000 records and they are in the queue:
1) How do I check whether a processor is running its process on a single node or on all nodes. The config has been set to all nodes, but when the processor is running I see it displays 1 in the top right corner.
2) If one processor has been set to run only on primary node, do all other processors in the flow also run on Primary mode?
In the screenshot above, my getmongo is running in primary node, how do I make sure that the execute script processor runs in parallel on all 3 nifi nodes. As of now if I check the view status history in the executescript process I see data flowing only through the primary node.
Prior to NiFi 1.8.0, you'll want to connect your GetMongo processor to a Remote Process Group (RPG) that points back at the same cluster. Then in a separate flow you'll want an Input Port connected to your ExecuteScript processor. In the RPG configuration you specify that Input Port, then flow files will be distributed among the cluster (the RPG will send flow files to the Input Ports on all nodes). However the distribution is not load-balanced, see the RPG documentation for more details.
As of NiFi 1.8.0, there is a powerful new feature called Load-Balanced Connections, where you get more control over how the flow files are distributed among the nodes, and you don't have to do all the setup as you do in the RPG -> Input Port situation. See this blog for more details and examples.