Created 07-31-2023 01:36 AM
Hello,
I´m using QueryDatabaseTable with time_stamp max value column to get in real time +2k records per day. Despite I read all docs that I found I can not understand how it works.
It supose to do a select every time a new record is inserted in the table but if I take a look in the postgres log I can see that the query is executed every second in search for new records.
I though that this processors works like a trigger but it seems like it doesn´t and I´m afraid that this can affect database performance.
Can somebody explain how this processor internally works?
Thank you in advance
Created 07-31-2023 07:09 AM
Hi @MihaiMaranduca ,
You can use the "Run Schedule" Property to control how often a given processor runs. You can access this property by opening process Configuration, then select the Scheduling tab:
The Run Schedule can be configured using the Scheduling Strategy:
1- Timer Driven (Default): Run processor based on interval. the Zero value indicates that its running continuously. 5 secs mean every 5 seconds, 5 mins mean every 5 minutes and so on.
2- CRON Driven: Processor runs based on a schedule specified by CRON string.
More info on Scheduling: https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#scheduling-tab
If you find this is helpful please accept solution.
Thanks
Created 07-31-2023 11:59 PM
@MihaiMaranduca,
In addition to what @SAMSAL already said, I would add the followings:
- the NiFi processors WORK and DO NOT work as triggers. Let's remember that the triggers (in any sql database) are basically some database objects, whereas NiFi is a standalone application, connected to that Database. Using the Maximum-value Columns you will somehow mimic the trigger like functionality but in the end, NiFi will only execute an SQL Statement on your database, based on the column you have defined in the mentioned property.
When it comes to performance optimization, I would personally recommend you the following:
- Within QueryDatabaseTable, define the scheduling as per your requirements. If you know that your data gets loaded once per day, switch to an once per day execution .... if you need hourly, get an hourly execution and so on.
- Within your DBCPConnectionPool, pay attention to the following properties, as they are pretty important:
a) Time Between Eviction Runs = if you leave it default, most likely (depending on how the DB was configured as well) you will never close your sessions, eventually causing problems. Here you can set any value in millis, which will tell NiFi to drop the connection after that time has passed. For example, I am using 1440000 millis.
b) Max Idle Connections = The maximum number of connection that can remain idle without extra ones being released. I personally am working with a lower value than the one default.
c) Minimum Evictable Idle Time = The minimum time a connection may sit idle in the pool, before being marked for release. Here the value is 30 mins default but I am using 10 mins mostly, especially for the queries which have to be executed every 15mins.
Created 07-31-2023 02:34 AM
@MihaiMaranduca, Welcome to our community! To help you get the best possible answer, I have tagged in our NiFi experts @SAMSAL, @cotopaul @MattWho @steven-matison who may be able to assist you further.
Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Created 07-31-2023 07:09 AM
Hi @MihaiMaranduca ,
You can use the "Run Schedule" Property to control how often a given processor runs. You can access this property by opening process Configuration, then select the Scheduling tab:
The Run Schedule can be configured using the Scheduling Strategy:
1- Timer Driven (Default): Run processor based on interval. the Zero value indicates that its running continuously. 5 secs mean every 5 seconds, 5 mins mean every 5 minutes and so on.
2- CRON Driven: Processor runs based on a schedule specified by CRON string.
More info on Scheduling: https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#scheduling-tab
If you find this is helpful please accept solution.
Thanks
Created 07-31-2023 11:59 PM
@MihaiMaranduca,
In addition to what @SAMSAL already said, I would add the followings:
- the NiFi processors WORK and DO NOT work as triggers. Let's remember that the triggers (in any sql database) are basically some database objects, whereas NiFi is a standalone application, connected to that Database. Using the Maximum-value Columns you will somehow mimic the trigger like functionality but in the end, NiFi will only execute an SQL Statement on your database, based on the column you have defined in the mentioned property.
When it comes to performance optimization, I would personally recommend you the following:
- Within QueryDatabaseTable, define the scheduling as per your requirements. If you know that your data gets loaded once per day, switch to an once per day execution .... if you need hourly, get an hourly execution and so on.
- Within your DBCPConnectionPool, pay attention to the following properties, as they are pretty important:
a) Time Between Eviction Runs = if you leave it default, most likely (depending on how the DB was configured as well) you will never close your sessions, eventually causing problems. Here you can set any value in millis, which will tell NiFi to drop the connection after that time has passed. For example, I am using 1440000 millis.
b) Max Idle Connections = The maximum number of connection that can remain idle without extra ones being released. I personally am working with a lower value than the one default.
c) Minimum Evictable Idle Time = The minimum time a connection may sit idle in the pool, before being marked for release. Here the value is 30 mins default but I am using 10 mins mostly, especially for the queries which have to be executed every 15mins.
Created 08-01-2023 11:35 PM
Good morning @cotopaul @SAMSAL First of all thank you very much for your time
Ok Ok so I see that I missunderstood how QueryDatabaseTable is "triggered". @SAMSAL I knew about those configuration but as I said I thought this processor only do the query when a new record is inserted in the database taking reference the maximum column value. Makes sense now the postgresql log...the querys executed every second since my RunSchedule is on default (0).
What I use minifi for is to load sales from a supermarket then send it via API (and others transformations), so I need this to be in real time as soon as the client paid. I guess I have to define Run Schedule every few seconds. @cotopaul I will take a look at my DBCPConnectionPool, this info it´s helpfull
Thank you all.