Created 08-01-2016 09:20 PM
I'm building a nifi flow with the nifi GUI. As part of the flow I have series of flat files I'm ingesting, which contains lines that I don't want in my data flow. These lines all start with the hash/pound symbol #. Any ideas how to filter these lines out? I was thinking a routeoncontent processor, but I'm not sure how to make it filter out lines.
Created 08-01-2016 09:34 PM
Hi @Ed Prout,
If you don't care about having one FlowFile by line from your input file, I'd suggest you to use RouteText processor with a matching strategy 'starts with' and adding a custom property like 'prefix' with the value '#'. This will create a relationship 'prefix' with all lines starting with # (then you'll want to route the lines for relationship 'unmatched' if you want lines not starting with a #).
Hope this helps.
Created 08-01-2016 09:25 PM
@Ed Prout Take a look at the extractText processor here
How to use extract processor example is here
Additional here is another example to extract text using NiFi
Created 08-01-2016 09:26 PM
Split that text file by line using the SplitText processor first: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitText/inde...
Then you can exclude the lines you don't using a basic regex.
Good example here: https://github.com/xmlking/nifi-examples/tree/master/split-route
Created 08-01-2016 09:34 PM
Hi @Ed Prout,
If you don't care about having one FlowFile by line from your input file, I'd suggest you to use RouteText processor with a matching strategy 'starts with' and adding a custom property like 'prefix' with the value '#'. This will create a relationship 'prefix' with all lines starting with # (then you'll want to route the lines for relationship 'unmatched' if you want lines not starting with a #).
Hope this helps.
Created 08-02-2016 08:54 PM
I went with your answer first Pierre, and it worked. Thanks!