Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Cloudera Employee

Background:

In NiFI, the PutSQL Processor is unable to perform the batch inserts although a batch size had been configured in the processor. There is a commit for each insert because the content of each FlowFile is a unique SQL statement.

Lets take an example workflow,

CSV file -> 50 lines (name, jobtitle) -> 50 flow files

For each line, PutSQL processor (batchsize-10) is invoked. Because each flow file is unique in its content, 50 commits are performed.

Line1 -> Flowfile#1

Insert into Employee (name, job title) VALUES ('Bryan B','Director')

Line2 -> Flowfile#2

Insert into Employee (name, job title) VALUES ('Joe W','CTO')

The PutSQL processor batches FlowFiles by matching SQL statements. To match SQL statements with unique inserts, you must configure the dataflow so that the SQL statements make use of the "?" to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args.N.type and sql.args.N.value, where N is a positive integer. The sql.args.N.type is expected to be a number indicating the JDBC Types.

Objective:

If we can invoke PutSQL processor with same Insert statement like below, PutSQL processor will use batch size parameter correctly finally resulting in 5 commits (instead of 50).

3 Steps:

Use Extract Text Processor to create sql.args.N.value Flow file attributes

sql.args.1.value = Bryan B
sql.args.2.value = Director

Use Update Attribute Processor to set sql.args.N.type Flow file attributes

sql.args.1.type = 12 (VARCHAR)
sql.args.2.type = 12 

Finally, use ReplaceText Processor to build the identical insert statements

Insert into Employee ("name", "job title") VALUES (?,?)

Final Outcome:

By following the above, every single FlowFile will have the exact same insert statement and will be properly batched based on the batch size property in the processor. A single commit will be issued followed by the insert of all batched FlowFiles.

9,700 Views
Comments
New Contributor

I found the instructions for setting sql.args.N.type setting confusing. You can find the types and corresponding code values in the Java API docs. Here is a link to the corresponding Java 8 documentation.

New Contributor

Why "PutSQL Processor is unable to perform the batch inserts although a batch size had been configured in the processor."??

Why isn't it just changed to allow SQL statements that are not identical be batched, instead of having to do it this way?

Of course SQL statements will almost certainly be different from one another!

Sounds like a bug in the processor...

New Contributor

I think this post is based on an erroneous assumption that SQL statements need to be identical to be batched by the PutSQL processor.

Perhaps the misunderstanding was because Support Fragmented Transactions was set to true (the default) which will fragment the batch unless you are using fragment.identifier to correctly identify your batches.

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache...

Explorer

How did you do it in nifi? Do you have screenshot on how did you do it?

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎03-29-2017 07:12 PM
Updated by:
 
Contributors
Top Kudoed Authors