Created 03-30-2017 11:00 PM
Hello,
I currently have a flow in NiFi that receives flowfiles and routes them based on topic, however every flowfile received in the flow is a bash that contains multiple messages and the number of lines that each message contains can vary so I cannot split by number of lines. Is there a way in NiFi that I can split based on a specific text sequence? The main point of doing this is that I want to know how many messages come inside each bash so if there could be a way to count how many times a specific word happens inside the content of the flowfile or to split the flowfile based on text content it would be really helpful cause based o number of splits I would know how many messages are in each bash. Is there a way to do something like this in NiFi? I am using NiFi version NiFi-1.1.0. Any suggestions would truly be appreciated!
Created 03-31-2017 06:53 AM
The SplitContent processor may be what you are looking for: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitContent/i.... It lets you define a byte sequence to split by.
Created 03-31-2017 06:53 AM
The SplitContent processor may be what you are looking for: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitContent/i.... It lets you define a byte sequence to split by.
Created 04-03-2017 06:27 PM
As @Hellmar Becker noted, SplitContent allows you to split on arbitrary byte sequences, but if you are looking for a specific word, SplitText
will also achieve what you want. You may also want to look at RouteText
, which allows you to apply a literal or regular expression to every line in the flowfile content and route each individually based on their matching results.
Finally, if you only care about the occurrence count of a specific word or sequence in the flowfile, you could use a small script in ExecuteScript
or even ExecuteStreamCommand
and use a terminal command like $ tr ' ' '\n' < FILE | grep WORD | wc -l
(from here).