question Re: Looking for something like GenerateTableFetch in Support Questions

Looking for something like GenerateTableFetch

Garyy — Wed, 13 Jan 2021 23:30:27 GMT

I use ExecuteSQLRecord to run a query and write to CSV format. The table has 10M rows. Although I can split the output into multiple flow files, the query is executed by only a single thread and is very slow.

Is there a way to partition the query into multiple queries so that the next processor can run multiple concurrent tasks, each one process one partition? It would be like:

GenerateTableFetch -> ExecuteSQLRecord (with concurrent tasks)

The problem is that GenerateTableFetch only accepts table name as input. It does not accept customized queries.

Please advise if you have solutions. I am new to NiFi. So I would like your details. Thank you in advance.

Re: Looking for something like GenerateTableFetch

Garyy — Thu, 14 Jan 2021 20:53:25 GMT

Figured out an alternative way. I developed a Oracle PL/SQL function which takes table name as an argument, and produces a series of queries like "SELECT * FROM T1 OFFSET x ROWS FETCH NEXT 10000 ROWS ONLY". The number of queries is based on the number of rows of the table, which is a statistics number in the catalog table. If the table has 1M rows, and I want to have 100k rows in each batch, it will produces 10 queries. I use ExecuteSQLRecord to call this function, which effectively does the job of NiFi processor GenerateTableFetch. My next processor (e.g. ExecuteSQLRecord again) can now have 10 concurrent tasks working in parallel.