- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Merge files Based on file headers.?
- Labels:
-
Apache NiFi
Created ‎08-29-2016 06:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I need to merge contents based on .CSV file headers. Lets say if i have 10 files in a folder and 5 of them with same header Name,Age,Gender.I want to merge all those 5 together and send rest to failures. How can i do that.?
Created ‎08-29-2016 06:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The mergeContent processor is not designed to look at the content of the NiFi FlowFiles it is merging. What you will want to do first is use a RouteOnContent processor to route only those Flowfiles where Content contains the headers you want to merge. The 'unmatched' FlowFiles could then be routed elsewhere or auto-terminated. Thanks,
Matt
Created ‎08-29-2016 06:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The mergeContent processor is not designed to look at the content of the NiFi FlowFiles it is merging. What you will want to do first is use a RouteOnContent processor to route only those Flowfiles where Content contains the headers you want to merge. The 'unmatched' FlowFiles could then be routed elsewhere or auto-terminated. Thanks,
Matt
Created on ‎08-29-2016 08:22 PM - edited ‎08-19-2019 03:11 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@mclark,
Ok , but RouteOnContent checks for the string in the whole file. where as i want to compare only the firstline .
if i have my RouteOnContent like below..it would route files to "Header" even if the data satisfies the RegEx.
Created ‎08-29-2016 09:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your Regex above says the CSV file content must start with Tagname,Timestamp,Value,Quality,QualityDetail,PercentGood
So, it should not route to "Header" unless the CSV starts with that. What is found later in the CSV file should not matter. I tried this and it seems to work as expected. If i removed the '^', then all files matched.
Your processor is also loading 1 MB worth of the CSV content for evaluation; however, the string you are searching for is far fewer bytes. If you only want to match against the first line, reduce the size of the buffer from '1 MB' to maybe '60 b'. If I changed the buffer to '60 b' and removed the '^' from the regex above, only the files with the matching header were routed to "header". Thanks,
Matt
