Support Questions

Find answers, ask questions, and share your expertise

How to merge data from 3 Flowfile as a UNION statement

avatar
Explorer

 

Hello Folks,

I want to combine 3 FlowFiles using a UNION statement. I’ve already transformed the 3 FlowFiles to have the same schema and would like to merge them into a single content, structured like this:

FlowFile1 Content
UNION ALL
FlowFile2 Content
UNION ALL
FlowFile3 Content

I’d appreciate your help with this.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@HoangNguyen 

There isn't an existing processor included with Apache NiFi capable of performing an UNION ALL against the contents of multiple FlowFiles. The JoinEnrichment is the only processor that can modify the contents of one FlowFile using the contents of another, but that only handles two FlowFiles (original FlowFile and enrichment FlowFile) in a single execution. 

The other record orientated processor all perform actions against an individual record in a FlowFile.

You may need to develop your own custom processor for such a task.  Something like the MergeRecord processor that bins like FlowFiles and then performs a UNION ALL on those binned FlowFiles. 

You could also raise a Jira in Apache NiFi (https://issues.apache.org/jira/browse/NIFI) asking for a processor that can perform such an operation and maybe someone would attempt to build it if their us enough Apache Community interest.  

You could also explore what Cloudera offers to its customers in terms of professional services that could help with building custom processors for Cloudera Flow Management offerings based off Apache NiFi.

Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

View solution in original post

1 REPLY 1

avatar
Master Mentor

@HoangNguyen 

There isn't an existing processor included with Apache NiFi capable of performing an UNION ALL against the contents of multiple FlowFiles. The JoinEnrichment is the only processor that can modify the contents of one FlowFile using the contents of another, but that only handles two FlowFiles (original FlowFile and enrichment FlowFile) in a single execution. 

The other record orientated processor all perform actions against an individual record in a FlowFile.

You may need to develop your own custom processor for such a task.  Something like the MergeRecord processor that bins like FlowFiles and then performs a UNION ALL on those binned FlowFiles. 

You could also raise a Jira in Apache NiFi (https://issues.apache.org/jira/browse/NIFI) asking for a processor that can perform such an operation and maybe someone would attempt to build it if their us enough Apache Community interest.  

You could also explore what Cloudera offers to its customers in terms of professional services that could help with building custom processors for Cloudera Flow Management offerings based off Apache NiFi.

Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt