Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to merge nifi cluster flowfile

avatar
Explorer

hi

i have setup a nifi cluster, where i have 2 process group, first process group all the processor set to All node execution, and second process group is primary node execution. any idea how to mergecontent of the cluster flow file?

1 ACCEPTED SOLUTION

avatar
Super Mentor
@yong lau

-

The "Execution" processor configuration has nothing to do with FlowFiles at all. It simply controls whether the configured processor will be scheduled to run on every node or only the currently elected primary node. When a processor is scheduled to run, it will work against on those FlowFiles on incoming connection queues for that specific node. So if you have a processor configured for execution "Primary node" and there are FlowFiles queued on every node, only those FlowFiles on the primary node would get processed.

-

It is the role of the dataflow designer to construct a dataflow that routes all data to one node if creating a single FlowFile via merge is needed. Currently this can be accomplished using the postHTTP and ListenHTTP processors (These support sending FlowFiles (content plus FlowFile attributes)). The PostHTTP processor can be configured to send to a specific cluster node.

So ideally you would build in to your Flow that routes these FlowFiles you need merged to a PostHTTP configured to "send as FlowFile" to one specific node in your cluster. On that node you have a listenHTTP processor that is the target of that PostHTTP processor which routes the received FlowFiles to your MergeContent processor.

-

There is work in progress to make this process a lot easier. The new capability which is on development will allow redistribution of FlowFiles via a connection configuration. There will be distribution strategies that can be used in this process like distributing all FlowFiles with matching criteria (like matching FlowFile Attribute) to same node.

-

Thank you,

Matt

-

If you found this Answer addressed your original question, please take a moment to login and click "Accept" below the answer.

View solution in original post

3 REPLIES 3

avatar
Super Mentor
@yong lau

-

The "Execution" processor configuration has nothing to do with FlowFiles at all. It simply controls whether the configured processor will be scheduled to run on every node or only the currently elected primary node. When a processor is scheduled to run, it will work against on those FlowFiles on incoming connection queues for that specific node. So if you have a processor configured for execution "Primary node" and there are FlowFiles queued on every node, only those FlowFiles on the primary node would get processed.

-

It is the role of the dataflow designer to construct a dataflow that routes all data to one node if creating a single FlowFile via merge is needed. Currently this can be accomplished using the postHTTP and ListenHTTP processors (These support sending FlowFiles (content plus FlowFile attributes)). The PostHTTP processor can be configured to send to a specific cluster node.

So ideally you would build in to your Flow that routes these FlowFiles you need merged to a PostHTTP configured to "send as FlowFile" to one specific node in your cluster. On that node you have a listenHTTP processor that is the target of that PostHTTP processor which routes the received FlowFiles to your MergeContent processor.

-

There is work in progress to make this process a lot easier. The new capability which is on development will allow redistribution of FlowFiles via a connection configuration. There will be distribution strategies that can be used in this process like distributing all FlowFiles with matching criteria (like matching FlowFile Attribute) to same node.

-

Thank you,

Matt

-

If you found this Answer addressed your original question, please take a moment to login and click "Accept" below the answer.

avatar
Explorer

Hi Matt,

Thanks for your answer.

I'm thinking about is there any easy way to do it without using the POSTHTTP and LISTENHTTP.

Thanks

avatar
Super Mentor

@yong lau

-

If you don't want to use a dataflow to redistribute your to be merged to a single node, the only other option you have is to control the delivery of the source data that is going to be merged to a single node.
-
You'll need to ask yourself:

How are these files which you want to merge getting to your NiFi? Can that be controlled so this particular flow of data goes to one node in your cluster only?

-

Thanks,

Matt