The following scenario:
I extract (as a one-time load) all data from a very large table (40 Mio records) in 100k partitions. These are then converted to JSON (ConvertRecord) and finally converted with ConvertJSONtoSQL into one row SQL statement each. Strangely enough, the penultimate step (PutSQL) runs very slowly. I would like to insert the data into my Redshift cluster. What could be the reason? And how can I speed it up?
Details to the PutSQL processor:
I would consider bundling the rows into larger flowfiles using something like MergeRecord, and then using the PutDatabaseRecord processor, which uses prepared, parameterized SQL and is considerably faster than RBAR(Row By Agonizing Row) inserts as generated by PutSQL.
There are faster alternatives, but this may be the simplest one that will improve performance noticeably.