Support Questions
Find answers, ask questions, and share your expertise

SQL queries via custom groovy script causing unexpected OutOfMemory errors


We have a NiFi usecase including SQL queries on remote postgreSQL database using JDBC connection.

The goal is getting rows from database, then writing them into outputStream as CSV with delimiter specified as an argument. We have benchmarked two solutions :

-> ExecuteSQL + ExtractAvro + ConvertRecord processors

-> Custom groovy script found there : adapted to our usecase

import org.apache.nifi.controller.ControllerService
import groovy.sql.Sql
import java.nio.charset.StandardCharsets
def lookup = context.controllerServiceLookup
def dbServiceName = 'OUR_DB_SERVICE_NAME' 
def dbcpServiceId = lookup.getControllerServiceIdentifiers(ControllerService).find { 
cs -> lookup.getControllerServiceName(cs) == dbServiceName
def conn = lookup.getControllerService(dbcpServiceId)?.getConnection()
def flowFile = session.get()
if(!flowFile) return
try {
flowFile = session.write(flowFile, {inputStream, outputStream ->
def sql = new Sql(conn)
sql.rows(flowFile.getAttribute('REQUETE')).eachWithIndex { row, idx ->
outputStream.write((row.values().join(flowFile.getAttribute('DELIMITER')) + "\n").getBytes())
}as StreamCallback)
  session.transfer(flowFile, REL_SUCCESS)
} catch(e) {
log.error('Scripting error', e)
session.transfer(flowFile, REL_FAILURE)

The groovy script is faster than native ExecuteScript + ExtractAvro + ConvertRecord NiFi processors, but causes OutOfMemory errors while the first solution does not, and we have not been able to find out why this occurs yet.

Does someone have an idea about what in the script may be responsible of this behaviour (OOM), and how to avoid it?