Created 12-04-2015 05:39 AM
Is the executeSQL processor able to leverage streaming capability of a JDBC driver level 4 or 5. This is quite important for us as if not it would mean that Nifi is not able to get data in small batches that are processed progressively and that it needs to receive the whole dataset before going to processing. e.g: if we are extracting 1 Million records we need to wait for the whole dataset to arrive to do any processing instead of receiving small batch of 10 records or 100 records that we process and stream to the final location.
Created 12-04-2015 06:27 PM
@yjiang right now there is no ability to send out partial result sets. All of the results from a single query are written to the same FlowFile. There is, however, a ticket to enable what you are discussing here. https://issues.apache.org/jira/browse/NIFI-1251 Please check out the ticket and confirm that this is the same idea that you are suggesting here, or let me know if that's not the same idea.
Created 12-04-2015 06:23 PM
Looking at the ExecuteSQL code here. The capability description reads -
@CapabilityDescription("Execute provided SQL select query. Query result will be converted to Avro format." + " Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on " + "a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. " + "If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the " + "select query. " + "FlowFile attribute 'executesql.row.count' indicates how many rows were selected." )
Even though above para says - "Streaming is used so arbitrarily large result sets are supported." , it appears that its not referring to the JDBC streaming but the fact that ResultSet is broken down into smaller tuples and sent to next processor as stream.
Here is the snippet of Code to back that assessment -
Query Execution in ExecuteSQL and call to JDBCCommon.convertToAvroStream method reading data using getObject methodThe getObject method does not seem to support streaming alternative like getAscii etc as described here - https://docs.oracle.com/cd/B28359_01/java.111/b312...
Created 12-04-2015 06:27 PM
@yjiang right now there is no ability to send out partial result sets. All of the results from a single query are written to the same FlowFile. There is, however, a ticket to enable what you are discussing here. https://issues.apache.org/jira/browse/NIFI-1251 Please check out the ticket and confirm that this is the same idea that you are suggesting here, or let me know if that's not the same idea.
Created 12-09-2015 09:51 AM
This is exactly what we want.