Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Handling huge tables and complex joins in NiFI

avatar
Expert Contributor

Due to a limitation to the size (and type of data?) that can be cast within the ExecuteSQL processor (when casting a blob to a varchar I have issues if the field size exceeds 32K) I'm looking to utilize the GenerateTableFetch or QueryDatabaseTable processors provided with NiFi to ingest entire tables. QueryDatabaseTable seems promising, but I have run into two issues that has hindered using this approach. First of all the existing table size is way to large to ingest (probably terabytes or more) and I'm only interested in getting the records that are most recent, anyhow. Using a Max Initial Value on the date field seems to be a suitable key to keep track of the state. Is there anyway to inject a starting value other than the implicit key state achieve by ingesting the entire table? I was hoping to possible prime the query by initially hard coding a value for max_value_column in the `initial.maxvalue.{max_value_column}` parameter, but to no avail. Secondly, from what I've read in the forum it seems GenerateTableFetch would be the best option of there are joins required from more than one table. If this is so, could you provide an example of how one might go about implementing this? Thanks, ~Sean

3 REPLIES 3

avatar
Expert Contributor

If loading the entire table it the only option then would not providing a downstream processor allow the table to drain without clogging memory or other resources? Which processor would give me the most control on the number of rows/records returned with each iteration? With GenerateTableFetch it appears to be handled with the Partion Size parameter. With the QueryDatabaseTable processor I've experimented with using the Columns to Return and Max Rows per File parameters without much success. The result of executing the QueryDatabaseTable processor always results in bring the NiFi cluster to a halt requiring one or more restarts to recover, so I'm reluctant to just arbitrarily try things. Any suggestions would greatly be appreciated.

avatar
Expert Contributor

I suppose I should look outside of NiFi for solutions as there doesn't seem to be similar experiences or use cases as I have attempted.

avatar
Contributor

I used GenerateTableFetch for query generation split by id.

All queries go to ExecuteScript processor, where I change sql( in you case add join)

After that use ExecuteSQL