- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to Merge files together by file attribute in Nifi.?
- Labels:
-
Apache NiFi
Created on ‎08-10-2016 07:57 PM - edited ‎08-18-2019 03:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ,I have 1000 files in a folder , file names have date(yyyymmdd) in it. for ex data_20160810.csv...i have 200 files per each day (so 5 days files). I want merge those files by date . so if successful, i my output folder will contain 5 merged files one for each day.
i am trying to do this by "Correlation Attribute Name" property..but still it is not merging in groups.
what am i doing wrong.?
Created ‎08-10-2016 09:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The attribute to correlate on needs to be present in the flowfile for the Merge processor to use it. If you are using FetchFile to get the file, you can add an attribute into that processor using the filename or the substring of the file name. Then it will be present in the flowfile for subsequent processors to use.
Created ‎08-10-2016 09:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The attribute to correlate on needs to be present in the flowfile for the Merge processor to use it. If you are using FetchFile to get the file, you can add an attribute into that processor using the filename or the substring of the file name. Then it will be present in the flowfile for subsequent processors to use.
Created ‎08-11-2016 01:21 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This answer is correct, just wanted to add additional clarification...
The "Correlation Attribute Name" is not the actual value to correlate on, its the name of an attribute that has the value to correlate on. So as suggested, you could use an UpdateAttribute processor to create an attribute like:
correlation.id = ${filename:substring(5,13)}
Then in MergeContent put correlation.id as the value of Correlation Attribute Name.
Created on ‎08-11-2016 04:14 PM - edited ‎08-18-2019 03:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Bryan Bende , @emaxwell
something seems to be not correct..I am doing the same thing . but it is still mergeing all in to one file. here are some screen shots for your reference.
I was testing this with 4 files in source folder 2 per each date. expecting the mergecontent to output 2 files. but its merging all in 1 file. here is my update attribute process after Fetch.
and when I use data provenance I could correctly see the value like below..2 different correlation.Ids on 4 files as expected.(2 ids for 20121021 and 2 for 20121020)
and here is how mergeprocess looks
data provenance on the Mergeprocess JOIN type..
Created ‎01-07-2020 11:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @saikrishna_tara @bbende @emaxwell .
Thanks for the solution and it worked well for me.
I am new to Nifi and got the same problem statement as @saikrishna_tara. I am able to make it till merge content, i can see my files are in parent flow files. but in parent name of files are uuid of the flow files and not the actual name of the file which is processed.
I need to put all the parent flow file's actual name via email processor to the outside world.
Please let me know in case more details are required.
Regards
Nitin
Created ‎08-11-2016 04:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
looks like by just doing correlation.id instead of {correlation.id} in the MergeProcess is doing the trick .
Created ‎08-11-2016 05:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes thats what i was trying to say about it being the name of an attribute, and not the attribute itself.
When you put ${correlation.id} the framework evaluates that first, in your case it ends up being something like 20121021, and then MergeContent goes to look for an attribute called "20121021" which doesn't exist.
Created ‎08-25-2016 03:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How do I send files with same header (hardcoded??) to MergeContent process and files that do not match the header I am planning to send those to failures. can this be done .?
Created ‎08-25-2016 06:20 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You could have RouteOnAttribute processor right before MergeContent, and add a property like foo = ${header:equals("foo")} and then everything with a header of "foo" will be routed to a relationship called "foo", everything else will get dropped.
Created ‎11-30-2018 06:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ,
As part of requirement need to merge multiple flows using merge content processor , which needs to merged using two attributes , as suggested above I have used update Attribute before merge content processor and created a new attribute and then using this in correlation attribute in merge content. But I am getting multiple files as output, expecting one file per group .What configuration is needed to handle this ?
