Support Questions

Find answers, ask questions, and share your expertise

Query on executing NiFi in a clustered setup with remote processor groups.

avatar
Contributor

Hi,

We are currently running NiFi as a single instance & we are planning to move to a clustered setup(3 node cluster). Please consider the below sample flow,
ListFile -> updateAtrribute -> routeOnAttribute -> ExecuteStreamCommand(Executes a shell script) -> fetchFile -> updateAtrribute -> fetchFile -> putFile

Since we are going to run in cluster setup, we need to use Remote Processor groups to balance the load. We going to place the RPG after ListFile processor,
ListFile(on Primary Node) -> RPG
Input port -> updateAtrribute -> routeOnAttribute -> ExecuteStreamCommand(Executes a shell script) -> fetchFile -> updateAtrribute -> fetchFile -> putFile

My question is, if I want my ExecuteStreamCommand(which triggers a shell script) to execute only on the primary node & rest of the processors in all the nodes, can I go ahead and change the settings of processor to run 'On Primary Node'? Will it have any impact on the flow?

Thanks,
R.Rohit

1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @Rohit Ravishankar, yeah you will have impact on the flow.

You are going to have 3 node cluster and thinking to use RPG after ListFile processor.

let's consider you are having M01,M02,M03 are 3 NiFi nodes in the cluster and M01 is the Primary Node of the cluster.

1.So when ListFile processor runs then gives output to RPG, it is not guaranteed the file will goes to Primary node(M01).

2. RPG will take care of load balancing of nifi cluster and distributes the flowfiles accordingly.

3.If you are running ExecuteStreamCommand on Primary Node only, then it will triggers the command only if the flowfile will be on primary node at the time.in our assumption above processor will triggers the shell script only when flowfile will be on M01 node.

5.If RPG distributes the flowfile to M02(or)M03 nodes but the ExecuteStreamCommand processor is running on Primary Node only, in this cases those flow files won't triggers off the shell script.

View solution in original post

2 REPLIES 2

avatar
Master Guru

Hi @Rohit Ravishankar, yeah you will have impact on the flow.

You are going to have 3 node cluster and thinking to use RPG after ListFile processor.

let's consider you are having M01,M02,M03 are 3 NiFi nodes in the cluster and M01 is the Primary Node of the cluster.

1.So when ListFile processor runs then gives output to RPG, it is not guaranteed the file will goes to Primary node(M01).

2. RPG will take care of load balancing of nifi cluster and distributes the flowfiles accordingly.

3.If you are running ExecuteStreamCommand on Primary Node only, then it will triggers the command only if the flowfile will be on primary node at the time.in our assumption above processor will triggers the shell script only when flowfile will be on M01 node.

5.If RPG distributes the flowfile to M02(or)M03 nodes but the ExecuteStreamCommand processor is running on Primary Node only, in this cases those flow files won't triggers off the shell script.

avatar
Contributor

@Yash Thanks!