Support Questions

Find answers, ask questions, and share your expertise

Merge files Based on file headers.?

avatar
Super Collaborator

Hi, I need to merge contents based on .CSV file headers. Lets say if i have 10 files in a folder and 5 of them with same header Name,Age,Gender.I want to merge all those 5 together and send rest to failures. How can i do that.?

1 ACCEPTED SOLUTION

avatar
Master Mentor
@Saikrishna Tarapareddy

The mergeContent processor is not designed to look at the content of the NiFi FlowFiles it is merging. What you will want to do first is use a RouteOnContent processor to route only those Flowfiles where Content contains the headers you want to merge. The 'unmatched' FlowFiles could then be routed elsewhere or auto-terminated. Thanks,

Matt

View solution in original post

3 REPLIES 3

avatar
Master Mentor
@Saikrishna Tarapareddy

The mergeContent processor is not designed to look at the content of the NiFi FlowFiles it is merging. What you will want to do first is use a RouteOnContent processor to route only those Flowfiles where Content contains the headers you want to merge. The 'unmatched' FlowFiles could then be routed elsewhere or auto-terminated. Thanks,

Matt

avatar
Super Collaborator

@mclark,

Ok , but RouteOnContent checks for the string in the whole file. where as i want to compare only the firstline .

if i have my RouteOnContent like below..it would route files to "Header" even if the data satisfies the RegEx.

7063-roc.png

avatar
Master Mentor
@Saikrishna Tarapareddy

Your Regex above says the CSV file content must start with Tagname,Timestamp,Value,Quality,QualityDetail,PercentGood

So, it should not route to "Header" unless the CSV starts with that. What is found later in the CSV file should not matter. I tried this and it seems to work as expected. If i removed the '^', then all files matched.

Your processor is also loading 1 MB worth of the CSV content for evaluation; however, the string you are searching for is far fewer bytes. If you only want to match against the first line, reduce the size of the buffer from '1 MB' to maybe '60 b'. If I changed the buffer to '60 b' and removed the '^' from the regex above, only the files with the matching header were routed to "header". Thanks,

Matt