Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NiFi Will QueryDatabaseTable reset the Maximum-value Columns everytime I restart the processor

avatar
Rising Star

hi Guys,

According to the document mentioned below, it seems like if I will restart the processor it will reset the value of maximum column value I have provided and will start fetching data from the beginning.

Document Link: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apach...

A comma-separated list of column names. The processor will keep track of the maximum value for each column that has been returned since the processor started running.
  • However, I tested this behavior, and even if I restart the processor I get incremental load only. is there a mistake in the document or have missed something?
  • What would happen if I re-deploy the job, I mean deleting the job and re-creating it from the template?
  • In the code, it has mentioned that the value will be stored as part of Scop.CLUSTER. would someone please explain to me what is it? and in which conditions the state will be cleared?
@Stateful(scopes = Scope.CLUSTER, description = "After performing a query on the specified table, the maximum values for "        + "the specified column(s) will be retained for use in future executions of the query. This allows the Processor "        + "to fetch only those records that have max values greater than the retained values. This can be used for "        + "incremental fetching, fetching of newly added rows, etc. To clear the maximum values, clear the state of the processor "        + "per the State Management documentation")
1 ACCEPTED SOLUTION

avatar
Short answers to your bullets!
  • However, I tested this behavior, and even if I restart the processor I get incremental load only. is there a mistake in the document or have missed something?
    • No mistake in the document. The processor is considered started the first time you started it. Stopping a processor and then restarting it is not going to clear it's "state"
  • What would happen if I re-deploy the job, I mean deleting the job and re-creating it from the template?
    • Everything will happen from scratch!
  • In the code, it has mentioned that the value will be stored as part of Scop.CLUSTER. would someone please explain to me what is it? and in which conditions the state will be cleared?
    • NiFi provides two basic operating modes: standalone and clustered. When operating in a cluster, Processors often need some mechanism of coordinating state between nodes. So NiFi stores the state centrally in Zookeeper for "clustered" mode. Thus "clustered" scope for the state.

Now QueryDatabaseTable processor behavior in detail.

QueryDatabaseTable is a "stateful" processor. That is, it will store certain information regarding the processor's "processing", get it! Processor's processing!! No? Never mind!

So whenever this processor queries the database, it stores the information regarding the "Maximum value" of the columns you mentioned while configuring. So right click your processor and check the state!

64482-screen-shot-2018-03-04-at-111425-pm.png

Clicking on this option will present you a window as shown below.

64483-screen-shot-2018-03-04-at-111440-pm.png

You will be able to see whatever value "persisted" at that point of time by your processor. When you start-stop-start [restart in your terms] the processor, it simply won't make any difference to the state.

There you see a "Clear State" option. Clicking on that option can Clear the state for you. That is what is calling the clear() method of StateManager and "reset" your processor to square one!

Hope that helps!

View solution in original post

3 REPLIES 3

avatar
Short answers to your bullets!
  • However, I tested this behavior, and even if I restart the processor I get incremental load only. is there a mistake in the document or have missed something?
    • No mistake in the document. The processor is considered started the first time you started it. Stopping a processor and then restarting it is not going to clear it's "state"
  • What would happen if I re-deploy the job, I mean deleting the job and re-creating it from the template?
    • Everything will happen from scratch!
  • In the code, it has mentioned that the value will be stored as part of Scop.CLUSTER. would someone please explain to me what is it? and in which conditions the state will be cleared?
    • NiFi provides two basic operating modes: standalone and clustered. When operating in a cluster, Processors often need some mechanism of coordinating state between nodes. So NiFi stores the state centrally in Zookeeper for "clustered" mode. Thus "clustered" scope for the state.

Now QueryDatabaseTable processor behavior in detail.

QueryDatabaseTable is a "stateful" processor. That is, it will store certain information regarding the processor's "processing", get it! Processor's processing!! No? Never mind!

So whenever this processor queries the database, it stores the information regarding the "Maximum value" of the columns you mentioned while configuring. So right click your processor and check the state!

64482-screen-shot-2018-03-04-at-111425-pm.png

Clicking on this option will present you a window as shown below.

64483-screen-shot-2018-03-04-at-111440-pm.png

You will be able to see whatever value "persisted" at that point of time by your processor. When you start-stop-start [restart in your terms] the processor, it simply won't make any difference to the state.

There you see a "Clear State" option. Clicking on that option can Clear the state for you. That is what is calling the clear() method of StateManager and "reset" your processor to square one!

Hope that helps!

avatar
Rising Star

@Rahul Soni

Thanks for the great explanation. just a quick question, is there a way I can modify that value. Just in case if I need to restart the flow from some specific point

avatar

You may want to look at this answer.