Support Questions

Find answers, ask questions, and share your expertise

NiFi thread node per processor control

avatar
Expert Contributor

I'm seeing behavior in my environment were the work between processors is being handled by more than one node/thread resulting in multiples of the payload being created. For instance, the ExecuteSQL processor uses three nodes (I'm not sure I'm using the correct terminology here, but the number shows in the upper right of the processor box when active) which each generates a separate copy of the results set. Another potential issue outside of being inefficient is that the results aren't always exactly the same, but I use xpath to evaluate the results downstream. I'm assuming there are configuration properties that will globally influence this behavior, but I'm also interested in more per processor tuning options. Can you offer any suggestion regarding how to control this behavior?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Sean Murphy

Each Node in a NiFi cluster runs its own threads within its own processor working on its own set of FlowFiles. Nodes in a NiFi cluster have no knowledge of what FlowFiles are being worked on by other nodes. If you are seeing multiple copies of the same output, that suggest that each node in your cluster is processing the same files.

I am not sure how your dataflow is designed to ingest the data it works on, but ideally you want to design it in such a way to prevent each node from ingesting the same data/files.

Thanks,

Matt

View solution in original post

3 REPLIES 3

avatar
Master Mentor

@Sean Murphy

Each Node in a NiFi cluster runs its own threads within its own processor working on its own set of FlowFiles. Nodes in a NiFi cluster have no knowledge of what FlowFiles are being worked on by other nodes. If you are seeing multiple copies of the same output, that suggest that each node in your cluster is processing the same files.

I am not sure how your dataflow is designed to ingest the data it works on, but ideally you want to design it in such a way to prevent each node from ingesting the same data/files.

Thanks,

Matt

avatar
Super Collaborator

you can pin executesql to one node. This make sure you are running this from only one node. The success of this can be send to the downstream processes using a Remote Process Group, which can run on multiple nodes.

The document below is a good guide

https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html

Another alternative could be to use the QueryDatabaseTable processor, it may work better.

avatar
Master Mentor

The RPG can be used to redistribute the ingested data of a single node using teh primary node strategy mentind here across every node in your NiFi cluster. This is a great way to distribute the work load while ensuring each node is working a unique set of FlowFiles.