Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Files Duplicating using QueryDatabaseTable and PutFile

New Contributor

I have a very basic flow that I have put together to test a concept for a presentation. (Btw, relatively new to NiFi). The flow is setup like this: QueryDatabaseTable->SplitAvro->ConvertAvroToJSON->Putfile->LogAttributes

The query is very simple for the Query processor: "select * from test2" which has 8 rows of data. When I fire off the flow, basically it keeps generating files over and over containing the same 8 rows of data in each file.

I know this is probably something very simple to fix, but I just can't seem to find the setting where it will just run once...

I looked around on the boards but did not find anything similar (or entirely possible I maybe missed it).

Does anyone have a suggestion?

Thanks!

♣km

1 ACCEPTED SOLUTION

Expert Contributor

Please verify what have you configured for "Maximum-Value Column" in the QueryDatabaseTable processor. Plus how is your table structured. Nifi needs an increment key to determine the maximum id previously fetched.

Following article will help"

https://community.hortonworks.com/articles/51902/incremental-fetch-in-nifi-with-querydatabasetable.h...

View solution in original post

4 REPLIES 4

Expert Contributor

Please verify what have you configured for "Maximum-Value Column" in the QueryDatabaseTable processor. Plus how is your table structured. Nifi needs an increment key to determine the maximum id previously fetched.

Following article will help"

https://community.hortonworks.com/articles/51902/incremental-fetch-in-nifi-with-querydatabasetable.h...

New Contributor

Thank you! That was it. I had no values in my Maximum-Value Column.

@Kelvin Mitchell

Your QueryTable processor queries continuously your table and generates a FlowFile with each query result. Right-click on QueryTable processor, then Configure and Settings tab and you will see that by default Run Schedule is set to 0 seconds. That means query after query. You can change the interval to more seconds as such a new query will be fired later. Keep in mind that your query can be also smart to pick-up only new records. Is query doing that?

+++

If this helped, please vote/accept best answer.

New Contributor

@Constantin Stanca Thank for the follow up. This was very helpful in addition to Umair's answer. I changed the query to pick up the max rowid for the new records. Again, very helpful tip!

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.