Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Contributor

Background:

In NiFI, the PutSQL Processor is unable to perform the batch inserts although a batch size had been configured in the processor. There is a commit for each insert because the content of each FlowFile is a unique SQL statement.

Lets take an example workflow,

CSV file -> 50 lines (name, jobtitle) -> 50 flow files

For each line, PutSQL processor (batchsize-10) is invoked. Because each flow file is unique in its content, 50 commits are performed.

Line1 -> Flowfile#1

Insert into Employee (name, job title) VALUES ('Bryan B','Director')

Line2 -> Flowfile#2

Insert into Employee (name, job title) VALUES ('Joe W','CTO')

The PutSQL processor batches FlowFiles by matching SQL statements. To match SQL statements with unique inserts, you must configure the dataflow so that the SQL statements make use of the "?" to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args.N.type and sql.args.N.value, where N is a positive integer. The sql.args.N.type is expected to be a number indicating the JDBC Types.

Objective:

If we can invoke PutSQL processor with same Insert statement like below, PutSQL processor will use batch size parameter correctly finally resulting in 5 commits (instead of 50).

3 Steps:

Use Extract Text Processor to create sql.args.N.value Flow file attributes

sql.args.1.value = Bryan B
sql.args.2.value = Director

Use Update Attribute Processor to set sql.args.N.type Flow file attributes

sql.args.1.type = 12 (VARCHAR)
sql.args.2.type = 12 

Finally, use ReplaceText Processor to build the identical insert statements

Insert into Employee ("name", "job title") VALUES (?,?)

Final Outcome:

By following the above, every single FlowFile will have the exact same insert statement and will be properly batched based on the batch size property in the processor. A single commit will be issued followed by the insert of all batched FlowFiles.

17,467 Views
Comments

I found the instructions for setting sql.args.N.type setting confusing. You can find the types and corresponding code values in the Java API docs. Here is a link to the corresponding Java 8 documentation.

Why "PutSQL Processor is unable to perform the batch inserts although a batch size had been configured in the processor."??

Why isn't it just changed to allow SQL statements that are not identical be batched, instead of having to do it this way?

Of course SQL statements will almost certainly be different from one another!

Sounds like a bug in the processor...

I think this post is based on an erroneous assumption that SQL statements need to be identical to be batched by the PutSQL processor.

Perhaps the misunderstanding was because Support Fragmented Transactions was set to true (the default) which will fragment the batch unless you are using fragment.identifier to correctly identify your batches.

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache...

How did you do it in nifi? Do you have screenshot on how did you do it?