Let's suppose I have 3 flowfiles abc.xc, abc.cv abc.mn and I have to paste content of these incoming flowflies. There are millions of flowflies in queue out of which only those have same prefix should be pasted together.
For pasting content of these files all 3 should be present or else it should wait and remain in the queue.
Content of abc.xc
Content of abc.cv
Content of abc.mn
Earlier I have achieved it through spark streaming by joining files based on the prefix filename. But I don't want to maintain two different system so wanted to do it in NIFI.
Any suggestion would be much appreciated.
Nifi is inherently designed for handling messages independently. This is also what allows it to perform so well at scale. As a result operations where multiple messages need to be combined, such as here are not typical things you would want to do in NiFi.
In conclusion, there may be a trick to achieve this (though I cannot think of one right now) but it is fundamentally not a good fit with NiFi. Indeed, something like Spark is more suited for working with messages in a broader context.
If you really want to try things like this in NiFi, the mergecontent processor is usually the starting point. But even if you are able to merge messages properly this way, you would still need to pivot them afterwards to form the desired columns.
Firstly, provide unique names with prefix to the flowfiles especially to the 3 flowfiles....then you can use route on attribute to route these specific flowfile to a separate parallel path. Then, you can use MergeContent when all the files come together. This is the easier way assuming all 3 files would come one after another and no two same flow files arrive at the same time.
For this, you need to use some advanced logic and use Wait notify processor along with Control Rate to send only one flowfiles of each and merge them together.
Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.