Support Questions

maebert · ‎11-27-2019

The following scenario:

I extract (as a one-time load) all data from a very large table (40 Mio records) in 100k partitions. These are then converted to JSON (ConvertRecord) and finally converted with ConvertJSONtoSQL into one row SQL statement each. Strangely enough, the penultimate step (PutSQL) runs very slowly. I would like to insert the data into my Redshift cluster. What could be the reason? And how can I speed it up?

2019-11-27 17_35_11-NiFi Flow.jpg

Details to the PutSQL processor:

2019-11-27 17_41_40-NiFi Flow.jpg 2019-11-27 17_42_12-NiFi Flow.jpg

wcbdata · ‎11-27-2019

I would consider bundling the rows into larger flowfiles using something like MergeRecord, and then using the PutDatabaseRecord processor, which uses prepared, parameterized SQL and is considerably faster than RBAR(Row By Agonizing Row) inserts as generated by PutSQL.

There are faster alternatives, but this may be the simplest one that will improve performance noticeably.

maebert · ‎11-28-2019

Hi wcdata,
I took your advice and redesigned my workflow but even with 10 records the
PutDatabaseRecord loads and loads. I suspect that the Translate Field Names
setting is to blame. Because my source columns are capitalized and my
target columns are small. I don't want to define a schema here.

Cloudera Community

Support Questions

PutSQL Processor really slow

Design Pattern - NiFi Flow for Using PutSQL proces...

Flowfile is currently penalized in Apache NiFI bet...

PutSQL time format issue

Apache NIFI: Occasional Exception with PUTSQL Proc...

Jolt quick reference for Nifi Jolt Processors

Nifi - putsql for phoenix upsert very slow - impro...

Basic ETL with nifi slow convertjsontosql processo...

Build Custom Nifi Processor

How to set a processor to DEBUG when on Cloudera D...

Why putsql processor do not execute update query f...