Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

SQL queries via custom groovy script causing unexpected OutOfMemory errors

SQL queries via custom groovy script causing unexpected OutOfMemory errors


We have a NiFi usecase including SQL queries on remote postgreSQL database using JDBC connection.

The goal is getting rows from database, then writing them into outputStream as CSV with delimiter specified as an argument. We have benchmarked two solutions :

-> ExecuteSQL + ExtractAvro + ConvertRecord processors

-> Custom groovy script found there : adapted to our usecase

import org.apache.nifi.controller.ControllerService
import groovy.sql.Sql
import java.nio.charset.StandardCharsets
def lookup = context.controllerServiceLookup
def dbServiceName = 'OUR_DB_SERVICE_NAME' 
def dbcpServiceId = lookup.getControllerServiceIdentifiers(ControllerService).find { 
cs -> lookup.getControllerServiceName(cs) == dbServiceName
def conn = lookup.getControllerService(dbcpServiceId)?.getConnection()
def flowFile = session.get()
if(!flowFile) return
try {
flowFile = session.write(flowFile, {inputStream, outputStream ->
def sql = new Sql(conn)
sql.rows(flowFile.getAttribute('REQUETE')).eachWithIndex { row, idx ->
outputStream.write((row.values().join(flowFile.getAttribute('DELIMITER')) + "\n").getBytes())
}as StreamCallback)
  session.transfer(flowFile, REL_SUCCESS)
} catch(e) {
log.error('Scripting error', e)
session.transfer(flowFile, REL_FAILURE)

The groovy script is faster than native ExecuteScript + ExtractAvro + ConvertRecord NiFi processors, but causes OutOfMemory errors while the first solution does not, and we have not been able to find out why this occurs yet.

Does someone have an idea about what in the script may be responsible of this behaviour (OOM), and how to avoid it?


Don't have an account?
Coming from Hortonworks? Activate your account here