Support Questions

Find answers, ask questions, and share your expertise

How to Merge files together by file attribute in Nifi.?

avatar
Super Collaborator

Hi ,I have 1000 files in a folder , file names have date(yyyymmdd) in it. for ex data_20160810.csv...i have 200 files per each day (so 5 days files). I want merge those files by date . so if successful, i my output folder will contain 5 merged files one for each day.

i am trying to do this by "Correlation Attribute Name" property..but still it is not merging in groups.

what am i doing wrong.?

6527-mergebydate.png

1 ACCEPTED SOLUTION

avatar

@Saikrishna Tarapareddy

The attribute to correlate on needs to be present in the flowfile for the Merge processor to use it. If you are using FetchFile to get the file, you can add an attribute into that processor using the filename or the substring of the file name. Then it will be present in the flowfile for subsequent processors to use.

View solution in original post

11 REPLIES 11

avatar

@Saikrishna Tarapareddy

The attribute to correlate on needs to be present in the flowfile for the Merge processor to use it. If you are using FetchFile to get the file, you can add an attribute into that processor using the filename or the substring of the file name. Then it will be present in the flowfile for subsequent processors to use.

avatar
Master Guru

This answer is correct, just wanted to add additional clarification...

The "Correlation Attribute Name" is not the actual value to correlate on, its the name of an attribute that has the value to correlate on. So as suggested, you could use an UpdateAttribute processor to create an attribute like:

correlation.id = ${filename:substring(5,13)}

Then in MergeContent put correlation.id as the value of Correlation Attribute Name.

avatar
Super Collaborator

Hi @Bryan Bende , @emaxwell

something seems to be not correct..I am doing the same thing . but it is still mergeing all in to one file. here are some screen shots for your reference.

I was testing this with 4 files in source folder 2 per each date. expecting the mergecontent to output 2 files. but its merging all in 1 file. here is my update attribute process after Fetch.

6584-corid.png

and when I use data provenance I could correctly see the value like below..2 different correlation.Ids on 4 files as expected.(2 ids for 20121021 and 2 for 20121020)

6586-corat.png

and here is how mergeprocess looks

6587-mergecorat.png

data provenance on the Mergeprocess JOIN type..

6585-corat2.png

avatar
New Contributor

Hello @saikrishna_tara @bbende @emaxwell .

Thanks for the solution and it worked well for me.

 

I am new to Nifi and got the same problem statement as @saikrishna_tara. I am able to make it till merge content, i can see my files are in parent flow files. but in parent name of files are uuid of the flow files and not the actual name of the file which is processed.

I need to put all the parent flow file's actual name via email processor to the outside world.

Please let me know in case more details are required.

 

Regards

Nitin

avatar
Super Collaborator

looks like by just doing correlation.id instead of {correlation.id} in the MergeProcess is doing the trick .

avatar
Master Guru

Yes thats what i was trying to say about it being the name of an attribute, and not the attribute itself.

When you put ${correlation.id} the framework evaluates that first, in your case it ends up being something like 20121021, and then MergeContent goes to look for an attribute called "20121021" which doesn't exist.

avatar
Super Collaborator

@Bryan Bende ,

How do I send files with same header (hardcoded??) to MergeContent process and files that do not match the header I am planning to send those to failures. can this be done .?

avatar
Master Guru

You could have RouteOnAttribute processor right before MergeContent, and add a property like foo = ${header:equals("foo")} and then everything with a header of "foo" will be routed to a relationship called "foo", everything else will get dropped.

avatar
New Contributor

Hi ,

As part of requirement need to merge multiple flows using merge content processor , which needs to merged using two attributes , as suggested above I have used update Attribute before merge content processor and created a new attribute and then using this in correlation attribute in merge content. But I am getting multiple files as output, expecting one file per group .What configuration is needed to handle this ?