Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Nifi QueryDatabaseTable

avatar
New Contributor

Hello, 

I´m using QueryDatabaseTable with time_stamp max value column to get in real time +2k records per day. Despite I read all docs that I found I can not understand how it works. 

It supose to do a select every time a new record is inserted in the table but if I take a look in the postgres log I can see that the query is executed every second in search for new records.

I though that this processors works like a trigger but it seems like it doesn´t and I´m afraid that this can affect database performance.

Can somebody explain how this processor internally works?  

Thank you in advance

 

 

2 ACCEPTED SOLUTIONS

avatar
Super Guru

Hi @MihaiMaranduca ,

 

You can use the "Run Schedule" Property to control how often a given processor runs. You can access this property by opening process Configuration, then select the  Scheduling tab:

SAMSAL_0-1690811943934.png

 

The Run Schedule can be configured using the Scheduling Strategy:

 

1- Timer Driven (Default):  Run processor based on interval. the Zero value indicates that its running continuously. 5 secs mean every 5 seconds, 5 mins mean every 5 minutes and so on.

2- CRON Driven: Processor runs based on a schedule specified by CRON string.

More info on Scheduling: https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#scheduling-tab

 

If you find this is helpful please accept solution.

Thanks

 

 

 

View solution in original post

avatar

@MihaiMaranduca,

In addition to what @SAMSAL already said, I would add the followings:
- the NiFi processors WORK and DO NOT work as triggers. Let's remember that the triggers (in any sql database) are basically some database objects, whereas NiFi is a standalone application, connected to that Database. Using the Maximum-value Columns you will somehow mimic the trigger like functionality but in the end, NiFi will only execute an SQL Statement on your database, based on the column you have defined in the mentioned property.

When it comes to performance optimization, I would personally recommend you the following:
- Within QueryDatabaseTable, define the scheduling as per your requirements. If you know that your data gets loaded once per day, switch to an once per day execution .... if you need hourly, get an hourly execution and so on.

- Within your DBCPConnectionPool, pay attention to the following properties, as they are pretty important:
a) Time Between Eviction Runs = if you leave it default, most likely (depending on how the DB was configured as well) you will never close your sessions, eventually causing problems. Here you can set any value in millis, which will tell NiFi to drop the connection after that time has passed. For example, I am using 1440000 millis.
b) Max Idle Connections = The maximum number of connection that can remain idle without extra ones being released. I personally am working with a lower value than the one default.
c) Minimum Evictable Idle Time = The minimum time a connection may sit idle in the pool, before being marked for release. Here the value is 30 mins default but I am using 10 mins mostly, especially for the queries which have to be executed every 15mins.

View solution in original post

4 REPLIES 4

avatar
Community Manager

@MihaiMaranduca, Welcome to our community! To help you get the best possible answer, I have tagged in our NiFi experts @SAMSAL@cotopaul @MattWho @steven-matison  who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.

 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Super Guru

Hi @MihaiMaranduca ,

 

You can use the "Run Schedule" Property to control how often a given processor runs. You can access this property by opening process Configuration, then select the  Scheduling tab:

SAMSAL_0-1690811943934.png

 

The Run Schedule can be configured using the Scheduling Strategy:

 

1- Timer Driven (Default):  Run processor based on interval. the Zero value indicates that its running continuously. 5 secs mean every 5 seconds, 5 mins mean every 5 minutes and so on.

2- CRON Driven: Processor runs based on a schedule specified by CRON string.

More info on Scheduling: https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#scheduling-tab

 

If you find this is helpful please accept solution.

Thanks

 

 

 

avatar

@MihaiMaranduca,

In addition to what @SAMSAL already said, I would add the followings:
- the NiFi processors WORK and DO NOT work as triggers. Let's remember that the triggers (in any sql database) are basically some database objects, whereas NiFi is a standalone application, connected to that Database. Using the Maximum-value Columns you will somehow mimic the trigger like functionality but in the end, NiFi will only execute an SQL Statement on your database, based on the column you have defined in the mentioned property.

When it comes to performance optimization, I would personally recommend you the following:
- Within QueryDatabaseTable, define the scheduling as per your requirements. If you know that your data gets loaded once per day, switch to an once per day execution .... if you need hourly, get an hourly execution and so on.

- Within your DBCPConnectionPool, pay attention to the following properties, as they are pretty important:
a) Time Between Eviction Runs = if you leave it default, most likely (depending on how the DB was configured as well) you will never close your sessions, eventually causing problems. Here you can set any value in millis, which will tell NiFi to drop the connection after that time has passed. For example, I am using 1440000 millis.
b) Max Idle Connections = The maximum number of connection that can remain idle without extra ones being released. I personally am working with a lower value than the one default.
c) Minimum Evictable Idle Time = The minimum time a connection may sit idle in the pool, before being marked for release. Here the value is 30 mins default but I am using 10 mins mostly, especially for the queries which have to be executed every 15mins.

avatar
New Contributor

Good morning @cotopaul @SAMSAL First of all thank you very much for your time

Ok Ok so I see that I missunderstood how QueryDatabaseTable is "triggered". @SAMSAL I knew about those configuration but as I said I thought this processor only do the query when a new record is inserted in the database taking reference the maximum column value. Makes sense now the postgresql log...the querys executed every second since my RunSchedule is on default (0). 

What I use minifi for is to load sales from a supermarket then send it via API (and others transformations), so I need this to be in real time as soon as the client paid. I guess I have to define Run Schedule every few seconds. @cotopaul I will take a look at my DBCPConnectionPool, this info it´s helpfull

Thank you all.