I'm trying to use the ExecuteScript processor to take in a flow file, convert that flow file to a string, regex the string, convert the regexed string back into a flow file, then pass that flow file along the processor flow. I'm wanting to do this with Python.
As an example, an incoming string might have text like "The 8 dogs jumped over the 2 cats !". I want the result to look like "The_8_dogs_jumped_over_the_2_cats_!".
I've already got a regex that works in this case, my trouble lies within converting the incoming flow file into a string that I can then use regex on. The nature of the regex requires the string to be processed multiple times to eliminate all of the whitespaces, so I would like the ExecuteScript processor to be able to pass back into itself multiple times per flowfile.
Another note: I've tried using a ReplaceText processor coupled with a RouteOnContent processor, however this solution is much too slow to handle the quantity of data I'm working with (even when adjusting the concurrent tasks property).
In short I want to convert a flowfile to a string, do some regex substitutions, then convert back to a flowfile.