Created on 08-10-2016 07:57 PM - edited 08-18-2019 03:34 AM
Hi ,I have 1000 files in a folder , file names have date(yyyymmdd) in it. for ex data_20160810.csv...i have 200 files per each day (so 5 days files). I want merge those files by date . so if successful, i my output folder will contain 5 merged files one for each day.
i am trying to do this by "Correlation Attribute Name" property..but still it is not merging in groups.
what am i doing wrong.?
Created 08-10-2016 09:32 PM
The attribute to correlate on needs to be present in the flowfile for the Merge processor to use it. If you are using FetchFile to get the file, you can add an attribute into that processor using the filename or the substring of the file name. Then it will be present in the flowfile for subsequent processors to use.
Created 08-10-2016 09:32 PM
The attribute to correlate on needs to be present in the flowfile for the Merge processor to use it. If you are using FetchFile to get the file, you can add an attribute into that processor using the filename or the substring of the file name. Then it will be present in the flowfile for subsequent processors to use.
Created 08-11-2016 01:21 PM
This answer is correct, just wanted to add additional clarification...
The "Correlation Attribute Name" is not the actual value to correlate on, its the name of an attribute that has the value to correlate on. So as suggested, you could use an UpdateAttribute processor to create an attribute like:
correlation.id = ${filename:substring(5,13)}
Then in MergeContent put correlation.id as the value of Correlation Attribute Name.
Created on 08-11-2016 04:14 PM - edited 08-18-2019 03:34 AM
Hi @Bryan Bende , @emaxwell
something seems to be not correct..I am doing the same thing . but it is still mergeing all in to one file. here are some screen shots for your reference.
I was testing this with 4 files in source folder 2 per each date. expecting the mergecontent to output 2 files. but its merging all in 1 file. here is my update attribute process after Fetch.
and when I use data provenance I could correctly see the value like below..2 different correlation.Ids on 4 files as expected.(2 ids for 20121021 and 2 for 20121020)
and here is how mergeprocess looks
data provenance on the Mergeprocess JOIN type..
Created 01-07-2020 11:56 PM
Hello @saikrishna_tara @bbende @emaxwell .
Thanks for the solution and it worked well for me.
I am new to Nifi and got the same problem statement as @saikrishna_tara. I am able to make it till merge content, i can see my files are in parent flow files. but in parent name of files are uuid of the flow files and not the actual name of the file which is processed.
I need to put all the parent flow file's actual name via email processor to the outside world.
Please let me know in case more details are required.
Regards
Nitin
Created 08-11-2016 04:29 PM
looks like by just doing correlation.id instead of {correlation.id} in the MergeProcess is doing the trick .
Created 08-11-2016 05:45 PM
Yes thats what i was trying to say about it being the name of an attribute, and not the attribute itself.
When you put ${correlation.id} the framework evaluates that first, in your case it ends up being something like 20121021, and then MergeContent goes to look for an attribute called "20121021" which doesn't exist.
Created 08-25-2016 03:51 PM
How do I send files with same header (hardcoded??) to MergeContent process and files that do not match the header I am planning to send those to failures. can this be done .?
Created 08-25-2016 06:20 PM
You could have RouteOnAttribute processor right before MergeContent, and add a property like foo = ${header:equals("foo")} and then everything with a header of "foo" will be routed to a relationship called "foo", everything else will get dropped.
Created 11-30-2018 06:59 AM
Hi ,
As part of requirement need to merge multiple flows using merge content processor , which needs to merged using two attributes , as suggested above I have used update Attribute before merge content processor and created a new attribute and then using this in correlation attribute in merge content. But I am getting multiple files as output, expecting one file per group .What configuration is needed to handle this ?