Support Questions
Find answers, ask questions, and share your expertise

Is it possible to perform an incremental Import using Nifi ?

If it's possible, does NIFI automatically store the "last-value" of the checked column like in Sqoop job ?

1 ACCEPTED SOLUTION

Accepted Solutions

Super Guru

As of NiFi 0.6.0, there is a processor called QueryDatabaseTable that does something like this:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.QueryDatabaseT...

You can enter the columns for which you'd like the processor to keep track of, it will store the maximum value it has "seen" so far for each of the specified columns. When the processor runs, it will generate a SQL query to return rows whose values in those columns are greater than the observed maximum values. So if you have a unique (and increasing) ID field, for example, you can specify that in the QueryDatabaseTable processor, then the first execution will return all rows, and subsequent executions will not return any rows until one (or more) have an value for ID greater than the max.

This can be used to do incremental fetching like Sqoop does for only added/updated rows, using columns that contain timestamps, IDs, or other values that are increasing.

View solution in original post

7 REPLIES 7

Super Guru

As of NiFi 0.6.0, there is a processor called QueryDatabaseTable that does something like this:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.QueryDatabaseT...

You can enter the columns for which you'd like the processor to keep track of, it will store the maximum value it has "seen" so far for each of the specified columns. When the processor runs, it will generate a SQL query to return rows whose values in those columns are greater than the observed maximum values. So if you have a unique (and increasing) ID field, for example, you can specify that in the QueryDatabaseTable processor, then the first execution will return all rows, and subsequent executions will not return any rows until one (or more) have an value for ID greater than the max.

This can be used to do incremental fetching like Sqoop does for only added/updated rows, using columns that contain timestamps, IDs, or other values that are increasing.

View solution in original post

Thx a lot for answering ^^.

New Contributor

@Matt Burgess

I am using QueryDatabaseTable to fetch incrementally. However it generates the same data over and over again. To test I left only one record in the table and the processor generates that one record repeatedly. What I am doing wrong? I did not change any other processor configuration but only Max value columns (which has only one) and Table name properties. Thanks!

What if you need to run a few joins to be able to get all the data?

Is there then another way of doing this wish a llows a more complex query but still allows a watermark column to be used?

Super Guru

You could consider a non-materialized view that does all the joins. Also there is a Jira case (NIFI-1706) and GitHub Pull Request to allow custom queries with QueryDatabaseTable.

New Contributor

Any other option for this question?

Community Manager

@krishna123 as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: