Support Questions

frioua_hamza · ‎04-05-2016

If it's possible, does NIFI automatically store the "last-value" of the checked column like in Sqoop job ?

mburgess · ‎04-05-2016

As of NiFi 0.6.0, there is a processor called QueryDatabaseTable that does something like this:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.QueryDatabaseT...

You can enter the columns for which you'd like the processor to keep track of, it will store the maximum value it has "seen" so far for each of the specified columns. When the processor runs, it will generate a SQL query to return rows whose values in those columns are greater than the observed maximum values. So if you have a unique (and increasing) ID field, for example, you can specify that in the QueryDatabaseTable processor, then the first execution will return all rows, and subsequent executions will not return any rows until one (or more) have an value for ID greater than the max.

This can be used to do incremental fetching like Sqoop does for only added/updated rows, using columns that contain timestamps, IDs, or other values that are increasing.

View solution in original post

mburgess · ‎04-05-2016

As of NiFi 0.6.0, there is a processor called QueryDatabaseTable that does something like this:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.QueryDatabaseT...

You can enter the columns for which you'd like the processor to keep track of, it will store the maximum value it has "seen" so far for each of the specified columns. When the processor runs, it will generate a SQL query to return rows whose values in those columns are greater than the observed maximum values. So if you have a unique (and increasing) ID field, for example, you can specify that in the QueryDatabaseTable processor, then the first execution will return all rows, and subsequent executions will not return any rows until one (or more) have an value for ID greater than the max.

This can be used to do incremental fetching like Sqoop does for only added/updated rows, using columns that contain timestamps, IDs, or other values that are increasing.

frioua_hamza · ‎04-08-2016

Thx a lot for answering ^^.

speedbump1452 · ‎07-06-2017

@Matt Burgess

I am using QueryDatabaseTable to fetch incrementally. However it generates the same data over and over again. To test I left only one record in the table and the processor generates that one record repeatedly. What I am doing wrong? I did not change any other processor configuration but only Max value columns (which has only one) and Table name properties. Thanks!

max_boreback · ‎12-08-2017

What if you need to run a few joins to be able to get all the data?

Is there then another way of doing this wish a llows a more complex query but still allows a watermark column to be used?

mburgess · ‎12-09-2017

You could consider a non-materialized view that does all the joins. Also there is a Jira case (NIFI-1706) and GitHub Pull Request to allow custom queries with QueryDatabaseTable.

krishna123 · ‎12-09-2021

Any other option for this question?

VidyaSargur · ‎12-10-2021

@krishna123 as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.

Regards,

Vidya Sargur,
Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

Support Questions

Is it possible to perform an incremental Import using Nifi ?