Support Questions

Find answers, ask questions, and share your expertise

Merging Two different files based on attribute value or filename matching

avatar
Expert Contributor

Hi ,

          i've a flow  where there two flows one for xml file and other for corresponding Pdf file. Currently i'm merging those using Mergecontent and generating the zip file out of it.  by making sure minimum entries 2.

if input files load are heavy the merge is not matching  corresponding file means (same invoice file should be merged) but sometimes other invoice file merging due to big load from input at a time. i already implemented 10 secs delay  for each flow and merging two files. is there a mechanism to merge files based on common attribute before mergecontent.

PradNiFi1236_0-1689792732578.png

 

Thanks in advance

1 ACCEPTED SOLUTION

avatar
Master Mentor

@PradNiFi1236 

 

The "correlation attribute name" property is expected the name of an attribute on the FlowFile from which it will extract a value that it then looks for a match. Using NiFi Expression Language (NEL) ${filename:substringBeforeLast('.')} will grab the blues from the filename attribute and stop of everything after last ".". The resulting string is then expected to be the name of a different attribute on the FlowFile from which the MergeContent expects to extract the value used to determine like FlowFile for adding to a bin.  So prior to MergeContent you should use an UpdateAttribute processor using above NEL statement to assign the filename minus extension to a different correlation attribute name. Then user that attribute name in the MergeContent instead.

 

If you found this helped you, please take a moment to login and accept it as a solution.

Thank you,

Matt

 

View solution in original post

8 REPLIES 8

avatar
Super Guru

Hi @PradNiFi1236 ,

Not sure I fully understand your question but based on what I read you are trying to merge based on common attribute. If that is the case the MergeContent allows you to do that by utilizing the property "Correlation Attribute Name":

 

"

If specified, like FlowFiles will be binned together, where 'like FlowFiles' means FlowFiles that have the same value for this Attribute. If not specified, FlowFiles are bundled by the order in which they are pulled from the queue.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)

This Property is only considered if the <Merge Strategy> Property has a value of "Bin-Packing Algorithm".

"

Hope that helps.

 

avatar
Expert Contributor

Yes @SAMSAL , Thanks for reply. yes i want to merge xml file + pdf file for each Invoice number.

even i tried to merge using "Correlation Attribute Name" for ZIP merge strategy. its even merging for different invoice Numbers as well. Could you please suggest anything. 


note: i've filenames also same before the extentions(.xml and .pdf)
ex: 1) BE01_915131_FR01111046_20230713_183211.pdf
       2) BE01_915131_FR01111046_20230713_183211.xml

 

 

avatar
Super Guru

@PradNiFi1236,

Is the invoice number unique? If the filename matches between the two format, have you tried using this instead? Of course you have to derive just the filename without the extension in another attribute first and then use the new attribute for the Correlation Attribute. Also try resetting the "Minimum Number of Entries" to 1 . If none of that helped can you please post screenshot of the mergeContent processor configurations with other critical processors configurations as well.

Thanks

avatar
Expert Contributor

yes @SAMSAL , invoice number is unique, every time.
Actually the input is single one i.e. JSON file will be converted to xml and pdf and later we need to implement this logic  merging together and send it business. we need to merge in such a way same invoicenumber XML and pdf.

 

and yes i've tried with filename as well in correlation attribute by giving ${filename:substringBeforeLast('.')} but no luck as well,

PradNiFi1236_0-1689865876062.png

 

 

 

avatar
Master Mentor

@PradNiFi1236 

 

The "correlation attribute name" property is expected the name of an attribute on the FlowFile from which it will extract a value that it then looks for a match. Using NiFi Expression Language (NEL) ${filename:substringBeforeLast('.')} will grab the blues from the filename attribute and stop of everything after last ".". The resulting string is then expected to be the name of a different attribute on the FlowFile from which the MergeContent expects to extract the value used to determine like FlowFile for adding to a bin.  So prior to MergeContent you should use an UpdateAttribute processor using above NEL statement to assign the filename minus extension to a different correlation attribute name. Then user that attribute name in the MergeContent instead.

 

If you found this helped you, please take a moment to login and accept it as a solution.

Thank you,

Matt

 

avatar
Expert Contributor

@MattWho wow. that worked as i'm expecting.

so it won't merge the file till it has exact match file comes, right? and it won't fail either. if one of the file missed/delayed seems it is waiting till that file comes in.

 

Thanks a lot 

Regards,

pradeep

avatar
Super Guru

@MattWho, @steven-matison , @cotopaul ...Can you guys help with this?

avatar
Expert Contributor

Thanks @SAMSAL  for recommending @MattWho