Created 08-06-2018 08:07 AM
hi
i have setup a nifi cluster, where i have 2 process group, first process group all the processor set to All node execution, and second process group is primary node execution. any idea how to mergecontent of the cluster flow file?
Created 08-06-2018 01:29 PM
-
The "Execution" processor configuration has nothing to do with FlowFiles at all. It simply controls whether the configured processor will be scheduled to run on every node or only the currently elected primary node. When a processor is scheduled to run, it will work against on those FlowFiles on incoming connection queues for that specific node. So if you have a processor configured for execution "Primary node" and there are FlowFiles queued on every node, only those FlowFiles on the primary node would get processed.
-
It is the role of the dataflow designer to construct a dataflow that routes all data to one node if creating a single FlowFile via merge is needed. Currently this can be accomplished using the postHTTP and ListenHTTP processors (These support sending FlowFiles (content plus FlowFile attributes)). The PostHTTP processor can be configured to send to a specific cluster node.
So ideally you would build in to your Flow that routes these FlowFiles you need merged to a PostHTTP configured to "send as FlowFile" to one specific node in your cluster. On that node you have a listenHTTP processor that is the target of that PostHTTP processor which routes the received FlowFiles to your MergeContent processor.
-
There is work in progress to make this process a lot easier. The new capability which is on development will allow redistribution of FlowFiles via a connection configuration. There will be distribution strategies that can be used in this process like distributing all FlowFiles with matching criteria (like matching FlowFile Attribute) to same node.
-
Thank you,
Matt
-
If you found this Answer addressed your original question, please take a moment to login and click "Accept" below the answer.
Created 08-06-2018 01:29 PM
-
The "Execution" processor configuration has nothing to do with FlowFiles at all. It simply controls whether the configured processor will be scheduled to run on every node or only the currently elected primary node. When a processor is scheduled to run, it will work against on those FlowFiles on incoming connection queues for that specific node. So if you have a processor configured for execution "Primary node" and there are FlowFiles queued on every node, only those FlowFiles on the primary node would get processed.
-
It is the role of the dataflow designer to construct a dataflow that routes all data to one node if creating a single FlowFile via merge is needed. Currently this can be accomplished using the postHTTP and ListenHTTP processors (These support sending FlowFiles (content plus FlowFile attributes)). The PostHTTP processor can be configured to send to a specific cluster node.
So ideally you would build in to your Flow that routes these FlowFiles you need merged to a PostHTTP configured to "send as FlowFile" to one specific node in your cluster. On that node you have a listenHTTP processor that is the target of that PostHTTP processor which routes the received FlowFiles to your MergeContent processor.
-
There is work in progress to make this process a lot easier. The new capability which is on development will allow redistribution of FlowFiles via a connection configuration. There will be distribution strategies that can be used in this process like distributing all FlowFiles with matching criteria (like matching FlowFile Attribute) to same node.
-
Thank you,
Matt
-
If you found this Answer addressed your original question, please take a moment to login and click "Accept" below the answer.
Created 08-07-2018 03:09 AM
Hi Matt,
Thanks for your answer.
I'm thinking about is there any easy way to do it without using the POSTHTTP and LISTENHTTP.
Thanks
Created 08-07-2018 11:03 AM
-
If you don't want to use a dataflow to redistribute your to be merged to a single node, the only other option you have is to control the delivery of the source data that is going to be merged to a single node.
-
You'll need to ask yourself:
How are these files which you want to merge getting to your NiFi? Can that be controlled so this particular flow of data goes to one node in your cluster only?
-
Thanks,
Matt