Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

ExecuteScript performance

Solved Go to solution

ExecuteScript performance

Contributor

I'm seeing very poor performance in my python ExecuteScript. Essentially, for each flow file, I'm querying the Distributed Map Cache and joining that information with the info in the flow file. Two things are coming to mind as far as possible performance bottlenecks:

* Is ExecuteScript creating a new jython environment for every flow file? Or is it spinning it up once for each concurrent task and reusing?

* If only one jython environment is created per concurrent thread, is it possible for me to connect to the DMC just once and then just query keys for all further executions? Is there a setup method hook or something like that?

Any help would be appreciated. Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Re: ExecuteScript performance

ExecuteScript creates a new ScriptEngine for each one of the tasks specified in the Max Concurrent Tasks property, and reuses those engines for each flow file.

ExecuteScript basically allows you to implement an onTrigger() method using a scripting language, so it doesn't provide for other lifecycle things like setup and shutdown. For that you can use InvokeScriptedProcessor, there's a little more boilerplate as you must implement a subclass of Processor, but in return you can override the initialize() method to connect to the DMC once, as well as provide any number of extra properties and relationships to the "parent" InvokeScriptedProcessor for configuration. I have some examples on my blog including this one.

In general, I should mention that the Jython engine is relatively slow anyway, so you won't see great performance from it. You can get better performance by porting to Groovy or Javascript if possible.

3 REPLIES 3

Re: ExecuteScript performance

ExecuteScript creates a new ScriptEngine for each one of the tasks specified in the Max Concurrent Tasks property, and reuses those engines for each flow file.

ExecuteScript basically allows you to implement an onTrigger() method using a scripting language, so it doesn't provide for other lifecycle things like setup and shutdown. For that you can use InvokeScriptedProcessor, there's a little more boilerplate as you must implement a subclass of Processor, but in return you can override the initialize() method to connect to the DMC once, as well as provide any number of extra properties and relationships to the "parent" InvokeScriptedProcessor for configuration. I have some examples on my blog including this one.

In general, I should mention that the Jython engine is relatively slow anyway, so you won't see great performance from it. You can get better performance by porting to Groovy or Javascript if possible.

Re: ExecuteScript performance

Contributor

Thanks @Matt Burgess! This is working much better for me. For future reference, there are jython examples on how to do this in the unit tests of the nifi project: https://github.com/apache/nifi/tree/master/nifi-nar-bundles/nifi-scripting-bundle/nifi-scripting-pro...

Re: ExecuteScript performance

Contributor

Is there a performance Table for each of the scripting languages that we can refer !!

Don't have an account?
Coming from Hortonworks? Activate your account here