Support Questions

Find answers, ask questions, and share your expertise

Nifi executeSQL processor streaming capability

avatar
Expert Contributor

Is the executeSQL processor able to leverage streaming capability of a JDBC driver level 4 or 5. This is quite important for us as if not it would mean that Nifi is not able to get data in small batches that are processed progressively and that it needs to receive the whole dataset before going to processing. e.g: if we are extracting 1 Million records we need to wait for the whole dataset to arrive to do any processing instead of receiving small batch of 10 records or 100 records that we process and stream to the final location.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@yjiang right now there is no ability to send out partial result sets. All of the results from a single query are written to the same FlowFile. There is, however, a ticket to enable what you are discussing here. https://issues.apache.org/jira/browse/NIFI-1251 Please check out the ticket and confirm that this is the same idea that you are suggesting here, or let me know if that's not the same idea.

View solution in original post

3 REPLIES 3

avatar

Looking at the ExecuteSQL code here. The capability description reads -

@CapabilityDescription("Execute provided SQL select query. Query result will be converted to Avro format." + " Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on " + "a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. " + "If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the " + "select query. " + "FlowFile attribute 'executesql.row.count' indicates how many rows were selected." )

Even though above para says - "Streaming is used so arbitrarily large result sets are supported." , it appears that its not referring to the JDBC streaming but the fact that ResultSet is broken down into smaller tuples and sent to next processor as stream.

Here is the snippet of Code to back that assessment -

Query Execution in ExecuteSQL and call to JDBCCommon.

convertToAvroStream ->

convertToAvroStream method reading data using getObject method

The getObject method does not seem to support streaming alternative like getAscii etc as described here - https://docs.oracle.com/cd/B28359_01/java.111/b312...

avatar
Expert Contributor

@yjiang right now there is no ability to send out partial result sets. All of the results from a single query are written to the same FlowFile. There is, however, a ticket to enable what you are discussing here. https://issues.apache.org/jira/browse/NIFI-1251 Please check out the ticket and confirm that this is the same idea that you are suggesting here, or let me know if that's not the same idea.

avatar
Expert Contributor

This is exactly what we want.