Support Questions

Find answers, ask questions, and share your expertise

ExecuteScript performance

avatar
Rising Star

I'm seeing very poor performance in my python ExecuteScript. Essentially, for each flow file, I'm querying the Distributed Map Cache and joining that information with the info in the flow file. Two things are coming to mind as far as possible performance bottlenecks:

* Is ExecuteScript creating a new jython environment for every flow file? Or is it spinning it up once for each concurrent task and reusing?

* If only one jython environment is created per concurrent thread, is it possible for me to connect to the DMC just once and then just query keys for all further executions? Is there a setup method hook or something like that?

Any help would be appreciated. Thanks!

1 ACCEPTED SOLUTION

avatar
Master Guru

ExecuteScript creates a new ScriptEngine for each one of the tasks specified in the Max Concurrent Tasks property, and reuses those engines for each flow file.

ExecuteScript basically allows you to implement an onTrigger() method using a scripting language, so it doesn't provide for other lifecycle things like setup and shutdown. For that you can use InvokeScriptedProcessor, there's a little more boilerplate as you must implement a subclass of Processor, but in return you can override the initialize() method to connect to the DMC once, as well as provide any number of extra properties and relationships to the "parent" InvokeScriptedProcessor for configuration. I have some examples on my blog including this one.

In general, I should mention that the Jython engine is relatively slow anyway, so you won't see great performance from it. You can get better performance by porting to Groovy or Javascript if possible.

View solution in original post

3 REPLIES 3

avatar
Master Guru

ExecuteScript creates a new ScriptEngine for each one of the tasks specified in the Max Concurrent Tasks property, and reuses those engines for each flow file.

ExecuteScript basically allows you to implement an onTrigger() method using a scripting language, so it doesn't provide for other lifecycle things like setup and shutdown. For that you can use InvokeScriptedProcessor, there's a little more boilerplate as you must implement a subclass of Processor, but in return you can override the initialize() method to connect to the DMC once, as well as provide any number of extra properties and relationships to the "parent" InvokeScriptedProcessor for configuration. I have some examples on my blog including this one.

In general, I should mention that the Jython engine is relatively slow anyway, so you won't see great performance from it. You can get better performance by porting to Groovy or Javascript if possible.

avatar
Rising Star

Thanks @Matt Burgess! This is working much better for me. For future reference, there are jython examples on how to do this in the unit tests of the nifi project: https://github.com/apache/nifi/tree/master/nifi-nar-bundles/nifi-scripting-bundle/nifi-scripting-pro...

avatar
Contributor

Is there a performance Table for each of the scripting languages that we can refer !!