In NiFI, the PutSQL Processor is unable to perform the batch inserts although a batch size had been configured in the processor. There is a commit for each insert because the content of each FlowFile is a unique SQL statement.
For each line, PutSQL processor (batchsize-10) is invoked. Because each flow file is unique in its content, 50 commits are performed.
Line1 -> Flowfile#1
Insert into Employee (name, job title) VALUES ('Bryan B','Director')
Line2 -> Flowfile#2
Insert into Employee (name, job title) VALUES ('Joe W','CTO')
The PutSQL processor batches FlowFiles by matching SQL statements. To match SQL statements with unique inserts, you must configure the dataflow so that the SQL statements make use of the "?" to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args.N.type and sql.args.N.value, where N is a positive integer. The sql.args.N.type is expected to be a number indicating the JDBC Types.
If we can invoke PutSQL processor with same Insert statement like below, PutSQL processor will use batch size parameter correctly finally resulting in 5 commits (instead of 50).
Use Extract Text Processor to create sql.args.N.value Flow file attributes
sql.args.1.value = Bryan B
sql.args.2.value = Director
Use Update Attribute Processor to set sql.args.N.type Flow file attributes
Finally, use ReplaceText Processor to build the identical insert statements
Insert into Employee ("name", "job title") VALUES (?,?)
By following the above, every single FlowFile will have the exact same insert statement and will be properly batched based on the batch size property in the processor. A single commit will be issued followed by the insert of all batched FlowFiles.