Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Distributing data read from GetMongo in a nifi cluster

Distributing data read from GetMongo in a nifi cluster

Explorer

Hi,

I have a clustered nifi setup and we are running GetMongo processor with the Primary mode on, so that duplicate data is not fetched. This seems to be working fine. However once I have this data I want the following processes in the chain to run on a cluster, as in parallel processing to be done on this data which has been fetched. Somehow this is not happening. So my question is below assuming GetMongo has fetched 30000 records and they are in the queue:

1) How do I check whether a processor is running its process on a single node or on all nodes. The config has been set to all nodes, but when the processor is running I see it displays 1 in the top right corner.

2) If one processor has been set to run only on primary node, do all other processors in the flow also run on Primary mode?

Example:

nifi.png

In the screenshot above, my getmongo is running in primary node, how do I make sure that the execute script processor runs in parallel on all 3 nifi nodes. As of now if I check the view status history in the executescript process I see data flowing only through the primary node.

1 REPLY 1
Highlighted

Re: Distributing data read from GetMongo in a nifi cluster

Super Guru

Prior to NiFi 1.8.0, you'll want to connect your GetMongo processor to a Remote Process Group (RPG) that points back at the same cluster. Then in a separate flow you'll want an Input Port connected to your ExecuteScript processor. In the RPG configuration you specify that Input Port, then flow files will be distributed among the cluster (the RPG will send flow files to the Input Ports on all nodes). However the distribution is not load-balanced, see the RPG documentation for more details.

As of NiFi 1.8.0, there is a powerful new feature called Load-Balanced Connections, where you get more control over how the flow files are distributed among the nodes, and you don't have to do all the setup as you do in the RPG -> Input Port situation. See this blog for more details and examples.

Don't have an account?
Coming from Hortonworks? Activate your account here