Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Files Duplicating using QueryDatabaseTable and PutFile

avatar
New Contributor

I have a very basic flow that I have put together to test a concept for a presentation. (Btw, relatively new to NiFi). The flow is setup like this: QueryDatabaseTable->SplitAvro->ConvertAvroToJSON->Putfile->LogAttributes

The query is very simple for the Query processor: "select * from test2" which has 8 rows of data. When I fire off the flow, basically it keeps generating files over and over containing the same 8 rows of data in each file.

I know this is probably something very simple to fix, but I just can't seem to find the setting where it will just run once...

I looked around on the boards but did not find anything similar (or entirely possible I maybe missed it).

Does anyone have a suggestion?

Thanks!

♣km

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Please verify what have you configured for "Maximum-Value Column" in the QueryDatabaseTable processor. Plus how is your table structured. Nifi needs an increment key to determine the maximum id previously fetched.

Following article will help"

https://community.hortonworks.com/articles/51902/incremental-fetch-in-nifi-with-querydatabasetable.h...

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

Please verify what have you configured for "Maximum-Value Column" in the QueryDatabaseTable processor. Plus how is your table structured. Nifi needs an increment key to determine the maximum id previously fetched.

Following article will help"

https://community.hortonworks.com/articles/51902/incremental-fetch-in-nifi-with-querydatabasetable.h...

avatar
New Contributor

Thank you! That was it. I had no values in my Maximum-Value Column.

avatar
Super Guru

@Kelvin Mitchell

Your QueryTable processor queries continuously your table and generates a FlowFile with each query result. Right-click on QueryTable processor, then Configure and Settings tab and you will see that by default Run Schedule is set to 0 seconds. That means query after query. You can change the interval to more seconds as such a new query will be fired later. Keep in mind that your query can be also smart to pick-up only new records. Is query doing that?

+++

If this helped, please vote/accept best answer.

avatar
New Contributor

@Constantin Stanca Thank for the follow up. This was very helpful in addition to Umair's answer. I changed the query to pick up the max rowid for the new records. Again, very helpful tip!