Support Questions

Adda_Fuentes2 · ‎03-30-2017

Hello,

I currently have a flow in NiFi that receives flowfiles and routes them based on topic, however every flowfile received in the flow is a bash that contains multiple messages and the number of lines that each message contains can vary so I cannot split by number of lines. Is there a way in NiFi that I can split based on a specific text sequence? The main point of doing this is that I want to know how many messages come inside each bash so if there could be a way to count how many times a specific word happens inside the content of the flowfile or to split the flowfile based on text content it would be really helpful cause based o number of splits I would know how many messages are in each bash. Is there a way to do something like this in NiFi? I am using NiFi version NiFi-1.1.0. Any suggestions would truly be appreciated!

Report Inappropriate Content · ‎03-31-2017

The SplitContent processor may be what you are looking for: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitContent/i.... It lets you define a byte sequence to split by.

View solution in original post

Report Inappropriate Content · ‎03-31-2017

The SplitContent processor may be what you are looking for: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitContent/i.... It lets you define a byte sequence to split by.

alopresto · ‎04-03-2017

As @Hellmar Becker noted, SplitContent allows you to split on arbitrary byte sequences, but if you are looking for a specific word, SplitText will also achieve what you want. You may also want to look at RouteText, which allows you to apply a literal or regular expression to every line in the flowfile content and route each individually based on their matching results.

Finally, if you only care about the occurrence count of a specific word or sequence in the flowfile, you could use a small script in ExecuteScript or even ExecuteStreamCommand and use a terminal command like $ tr ' ' '\n' < FILE | grep WORD | wc -l (from here).

Cloudera Community

Support Questions

Need help on splitting a text on NiFi based on a specific content sequence or word