Created 05-10-2023 07:16 AM
Hey there,
for the sake of simplicity, I try to break down my problem to the core. I have flowfiles and their content is plain text (without any schema) representing data read from files on the filesystem. Via site-to-site I send them on a remote NiFi instance and place them there. During that ETL pipeline I need to query some additional metadata, sometimes via SQL queries, sometimes just some logic derived from the data.
Most of this metadata is simply added as attribute as desired. The ExecuteSQL processor however, overwrites the flowfile content, for example. This is why I need to duplicate/fork the flowfiles at some point. One flow keeps the original data content, the other queries the additional data which gets written to the flowfile content.
How can I merge these corresponding 2 flowfiles each back together? I know the ForkEnrichment and JoinEnrichment pattern, but they only merge content, additionally the content needs to be in some kind of schema as the Record Reader and Writer use Controller Services like AvroReader/Writer etc. Also I can't find any guide how to write a ScriptedReader/Writer.
Another approach might be to fork the flowfiles again with ForkEnrichment. Then I parse the content of the enrichment flowfiles as attributes, empty the content, merge them with MergeContent Processor assigning the value of Correlation Attribute Name to enrichment.group.id which should be the same for two flowfiles each in the relationship enrichment and original going out of ForkEnrichment. This won't work as expected. Any hints?
It is hard to explain my problem and what I already have tried in simple word. But actually I want to merge two flowfiles that have been forked at some point. I want to keep the content of the relationship original and add attributes from the relationship enrichment. Is this really such a edgy use case that I can't find any approaches for this? I'm desperate.
Thankful for any help. Best regards
Chris
Created 05-10-2023 09:44 AM
Welcome to the community @wchris. While you are waiting for someone with more experience than me to reply, I thought I would drop a couple of links in hopes getting you closer. I'll also tag in @SAMSAL in case he has any insights.
We hope that you will find a satisfactory solution to your question.
Created 06-14-2023 02:33 PM
Does ExecuteSQL erase some of the attributes that could be used to associate the FlowFiles futher down stream?