Hello, does anyone know if there are any known issues with the Nifi processor: ExtractText?
I set the backpressure threshhold in the connection before to 100 and that seems to work but when I set it to 0, it just fills up nonstop in the queue. I have screenshots of the flow and the configs below.
Also I know the regex works... data gets through just very very slowley
Can you explain a bit more about your use case, data flow, and configuration options? I see you're using features like DOTALL Mode, Multiline Mode, etc. but the only attribute you're parsing in ExtractText is syslog_datetime. Perhaps you could use ListenSyslog which writes syslog.timestamp as an attribute for you? I apologize in advance if I don't understand your use case, I'm happy to help if you can share more details.
Currently, our use case is simple. Tail a file that has data generating on it on Server 1, move it to Server 2, extract the timestamp from the actual text (because using the timestamp from when it arrives in Server 2 is inaccurate and we need timestamps to create filenames to divide data into folders so it can be quicker to search in HUNK), break the timestamp into variables that are then used to Merge a file and create a filename to put it int he correct folders and deliver into HDFS.
I have removed DOTALL Mode, Multiline Mode, etc as seen in screenshot below and the bottleneck continues to happen. I have also reduced the regex to less complexity in an attempt to see if the bottleneck is in the parsing of the regex but the bottleneck still occurs.
Based on your screenshot above it appears the overall CPU usage of your extractText processor is very low. This appears to be more of a thread starved issue perhaps.
What is your Scheduling Strategy configured to use on your ExtractText processor?
How many Concurrent Tasks do you have assigned to your ExtractText processor?
Try increasing your Concurrent Tasks to 3 - 5. Also change your Run Duration to 2s. Run Duration takes away some of the overhead associated with each session by working on multiple FlowFiles in a single session before committing them all to success relationship at the same time. This will however add 2 seconds latency to the oldest file processed in that 2 second run time, but overall performance will be better. I also don't know what Concurrent Task allocations look across other processor in your flow, but make sure you have not over allocated anywhere that could be affecting this processor. Generally speaking, 3 concurrent threads in most cases is enough, adding to many can affect overall performance rather the help.
Also make sure you have allocated enough overall threads to your NiFi in order to service all your workflows needs.
Under "Controller Settings" found in the menu located in the upper right corner of NiFi UI:
As you can see the above defaults are low. The Max timer Driven Thread count (used by all processors by default) should be set to 2 - 3 times the number of cores in your system.