Support Questions

Find answers, ask questions, and share your expertise

Performance of Python Script in NiFi is slower than Groovy Script why?

avatar

Hi

I am using python script and groovy script to do same task. I was trying to check the performance of this two scripts. But i noticed python script is working slow and groovy is working really fast.

In this below script i am just replacing the space with "|" in both of the scripts.

Python Script:

import string
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
 
class PyStreamCallback(StreamCallback):
  def __init__(self):
		pass
  def process(self, inputStream, outputStream):
	text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
	text=text.replace(",","|",1).replace(" ","|",3).replace("|"," ",1).replace("] ","]|")
	outputStream.write(bytearray(text.encode('utf-8')))


flowFile = session.get()
if(flowFile != None):
	try:
		flowFile = session.write(flowFile, PyStreamCallback())
        	session.transfer(flowFile, REL_SUCCESS)
    	except:
        	log.error('Something went wrong', e)
        	session.transfer(flowFile, REL_FAILURE)

Groovy Script:

import org.apache.nifi.processor.io.StreamCallback


import java.nio.charset.StandardCharsets


def flowFile = session.get()
if(!flowFile) return


flowFile = session.write(flowFile, {inputStream, outputStream ->
    inputStream.eachLine { line ->
		String[] names = line.split("\\[")
		names[0]=(names[0].replace(",", " ")).replace(" ","|")
		def a=(names[0]+"["+names[1])
        outputStream.write("${a}\n".toString().getBytes(StandardCharsets.UTF_8))
    }
} as StreamCallback)


session.transfer(flowFile, REL_SUCCESS)

1 ACCEPTED SOLUTION

avatar
Master Guru

I'm not familiar with the innards of either Groovy or Jython, but I am guessing that Jython is slower for the following reasons:

1) Groovy was built "for the JVM" and leverages/integrates with Java more cleanly

2) Jython is an implementation of Python for the JVM. Looking briefly at the code, it appears to go back and forth between the Java and Python idioms, so it is more "emulated" than Groovy.

3) Apache Groovy has a large, very active community that consistently works to improve the performance of the code, both compiled and interpreted.

In my own experience, Groovy and Javascript (Nashorn) perform much better in the scripted processors than Jython or JRuby. If you choose Jython, there are still a couple of things you can do to improve performance:

- Use InvokeScriptedProcessor (ISP) instead of ExecuteScript. ISP is faster because it only loads the script once, then invokes methods on it, rather than ExecuteScript which evaluates the script each time. I have an ISP template in Jython which should make porting your ExecuteScript code easier.

- Use ExecuteStreamCommand with command-line Python instead. You won't have the flexibility of accessing attributes, processor state, etc. but if you're just transforming content you should find ExecuteStreamCommand with Python faster.

- No matter which language you choose, you can often improve performance if you use session.get(int) instead of session.get(). That way if there are a lot of flow files in the queue, you could call session.get(1000) or something, and process up to 1000 flow files per execution. If your script has a lot of overhead, you may find handling multiple flow files per execution can significantly improve performance.

View solution in original post

1 REPLY 1

avatar
Master Guru

I'm not familiar with the innards of either Groovy or Jython, but I am guessing that Jython is slower for the following reasons:

1) Groovy was built "for the JVM" and leverages/integrates with Java more cleanly

2) Jython is an implementation of Python for the JVM. Looking briefly at the code, it appears to go back and forth between the Java and Python idioms, so it is more "emulated" than Groovy.

3) Apache Groovy has a large, very active community that consistently works to improve the performance of the code, both compiled and interpreted.

In my own experience, Groovy and Javascript (Nashorn) perform much better in the scripted processors than Jython or JRuby. If you choose Jython, there are still a couple of things you can do to improve performance:

- Use InvokeScriptedProcessor (ISP) instead of ExecuteScript. ISP is faster because it only loads the script once, then invokes methods on it, rather than ExecuteScript which evaluates the script each time. I have an ISP template in Jython which should make porting your ExecuteScript code easier.

- Use ExecuteStreamCommand with command-line Python instead. You won't have the flexibility of accessing attributes, processor state, etc. but if you're just transforming content you should find ExecuteStreamCommand with Python faster.

- No matter which language you choose, you can often improve performance if you use session.get(int) instead of session.get(). That way if there are a lot of flow files in the queue, you could call session.get(1000) or something, and process up to 1000 flow files per execution. If your script has a lot of overhead, you may find handling multiple flow files per execution can significantly improve performance.