Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Need help on splitting a text on NiFi based on a specific content sequence or word

avatar
Expert Contributor

Hello,

I currently have a flow in NiFi that receives flowfiles and routes them based on topic, however every flowfile received in the flow is a bash that contains multiple messages and the number of lines that each message contains can vary so I cannot split by number of lines. Is there a way in NiFi that I can split based on a specific text sequence? The main point of doing this is that I want to know how many messages come inside each bash so if there could be a way to count how many times a specific word happens inside the content of the flowfile or to split the flowfile based on text content it would be really helpful cause based o number of splits I would know how many messages are in each bash. Is there a way to do something like this in NiFi? I am using NiFi version NiFi-1.1.0. Any suggestions would truly be appreciated!

1 ACCEPTED SOLUTION

avatar
Not applicable

The SplitContent processor may be what you are looking for: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitContent/i.... It lets you define a byte sequence to split by.

View solution in original post

2 REPLIES 2

avatar
Not applicable

The SplitContent processor may be what you are looking for: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitContent/i.... It lets you define a byte sequence to split by.

avatar

As @Hellmar Becker noted, SplitContent allows you to split on arbitrary byte sequences, but if you are looking for a specific word, SplitText will also achieve what you want. You may also want to look at RouteText, which allows you to apply a literal or regular expression to every line in the flowfile content and route each individually based on their matching results.

Finally, if you only care about the occurrence count of a specific word or sequence in the flowfile, you could use a small script in ExecuteScript or even ExecuteStreamCommand and use a terminal command like $ tr ' ' '\n' < FILE | grep WORD | wc -l (from here).