Support Questions

Find answers, ask questions, and share your expertise

Need help on splitting a text on NiFi based on a specific content sequence or word

avatar
Expert Contributor

Hello,

I currently have a flow in NiFi that receives flowfiles and routes them based on topic, however every flowfile received in the flow is a bash that contains multiple messages and the number of lines that each message contains can vary so I cannot split by number of lines. Is there a way in NiFi that I can split based on a specific text sequence? The main point of doing this is that I want to know how many messages come inside each bash so if there could be a way to count how many times a specific word happens inside the content of the flowfile or to split the flowfile based on text content it would be really helpful cause based o number of splits I would know how many messages are in each bash. Is there a way to do something like this in NiFi? I am using NiFi version NiFi-1.1.0. Any suggestions would truly be appreciated!

1 ACCEPTED SOLUTION

avatar

The SplitContent processor may be what you are looking for: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitContent/i.... It lets you define a byte sequence to split by.

View solution in original post

2 REPLIES 2

avatar

The SplitContent processor may be what you are looking for: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitContent/i.... It lets you define a byte sequence to split by.

avatar

As @Hellmar Becker noted, SplitContent allows you to split on arbitrary byte sequences, but if you are looking for a specific word, SplitText will also achieve what you want. You may also want to look at RouteText, which allows you to apply a literal or regular expression to every line in the flowfile content and route each individually based on their matching results.

Finally, if you only care about the occurrence count of a specific word or sequence in the flowfile, you could use a small script in ExecuteScript or even ExecuteStreamCommand and use a terminal command like $ tr ' ' '\n' < FILE | grep WORD | wc -l (from here).