Support Questions

Find answers, ask questions, and share your expertise

Dynamic Creation of Processors in NiFi

avatar
Expert Contributor

We have a system composed of many databases and tables and we want to use NiFi to query these tables based on our requirements. Since NiFi's QueryDatabaseTable processor is statically linked to a single table, what we intend to do is to dynamically generate many processors of this kind to match the number of our tables in our different systems. Is this possible using ExecuteScript processor (or anything similar)?

1 ACCEPTED SOLUTION

avatar
Master Guru

You may also want to check out ListDatabaseTables which periodically perform a listing of all the database tables to query:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ListDatabaseTa...

Each flow file will get an attribute "db.table.name" and you would have to figure out how to create the appropriate SQL for each table and pass it to ExecuteSQL referencing ${db.table.name} in the SQL.

View solution in original post

3 REPLIES 3

avatar

Hi @J. D. Bacolod please take a look at this HCC article for using the API to configure processors on the fly:

https://community.hortonworks.com/articles/3160/update-nifi-flow-on-the-fly-via-api.html

Hope that helps!

avatar
Master Guru

You may also want to check out ListDatabaseTables which periodically perform a listing of all the database tables to query:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ListDatabaseTa...

Each flow file will get an attribute "db.table.name" and you would have to figure out how to create the appropriate SQL for each table and pass it to ExecuteSQL referencing ${db.table.name} in the SQL.

avatar
Master Guru

Also, as of NiFi 1.3.0 / HDF 3.0.0, GenerateTableFetch accepts incoming connections/flow files, so you can use ListDatabaseTables -> GenerateTableFetch -> RPG -> Input Port -> ExecuteSQL to fully distribute the fetching of batches of rows across your NiFi cluster. The RPG -> Input Port part is optional and only used on a cluster if you want to fetch rows in parallel.