Created 04-05-2016 12:49 PM
If it's possible, does NIFI automatically store the "last-value" of the checked column like in Sqoop job ?
Created 04-05-2016 01:13 PM
As of NiFi 0.6.0, there is a processor called QueryDatabaseTable that does something like this:
You can enter the columns for which you'd like the processor to keep track of, it will store the maximum value it has "seen" so far for each of the specified columns. When the processor runs, it will generate a SQL query to return rows whose values in those columns are greater than the observed maximum values. So if you have a unique (and increasing) ID field, for example, you can specify that in the QueryDatabaseTable processor, then the first execution will return all rows, and subsequent executions will not return any rows until one (or more) have an value for ID greater than the max.
This can be used to do incremental fetching like Sqoop does for only added/updated rows, using columns that contain timestamps, IDs, or other values that are increasing.
Created 04-05-2016 01:13 PM
As of NiFi 0.6.0, there is a processor called QueryDatabaseTable that does something like this:
You can enter the columns for which you'd like the processor to keep track of, it will store the maximum value it has "seen" so far for each of the specified columns. When the processor runs, it will generate a SQL query to return rows whose values in those columns are greater than the observed maximum values. So if you have a unique (and increasing) ID field, for example, you can specify that in the QueryDatabaseTable processor, then the first execution will return all rows, and subsequent executions will not return any rows until one (or more) have an value for ID greater than the max.
This can be used to do incremental fetching like Sqoop does for only added/updated rows, using columns that contain timestamps, IDs, or other values that are increasing.
Created 04-08-2016 10:00 AM
Thx a lot for answering ^^.
Created 07-06-2017 05:12 PM
I am using QueryDatabaseTable to fetch incrementally. However it generates the same data over and over again. To test I left only one record in the table and the processor generates that one record repeatedly. What I am doing wrong? I did not change any other processor configuration but only Max value columns (which has only one) and Table name properties. Thanks!
Created 12-08-2017 07:41 AM
What if you need to run a few joins to be able to get all the data?
Is there then another way of doing this wish a llows a more complex query but still allows a watermark column to be used?
Created 12-09-2017 02:28 AM
You could consider a non-materialized view that does all the joins. Also there is a Jira case (NIFI-1706) and GitHub Pull Request to allow custom queries with QueryDatabaseTable.
Created 12-09-2021 11:35 PM
Any other option for this question?
Created 12-10-2021 12:31 AM
@krishna123 as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.
Regards,
Vidya Sargur,