Created 03-01-2017 06:09 PM
Hi All, I am using a QueryDatabase Table processor to get data from a sql server database.My requirement is to get the data from the sql server source and put this data in HDFS. The table in sqlserver is a small table which has 240 rows so I was expecting it would generate not more than 240 flow files. But when I run this processor I see that It keeps on producing a lot more than 240 flowfiles. How do I make sure it only grabs the 240 rows and not continuously get the data over and over again. I used a merge content to merge the flowfiles, used a convertAvroToJSON processor and used PutHDFS processor. When I checked the file in HDFS it seems like it is repeating data.
Created 03-01-2017 06:24 PM
QueryDatabaseTable is usually used for "incremental" fetching, meaning it will only grab "new" rows. This is based on the "Maximum Value Columns" property, which is usually set to an ID or timestamp field in the database. That is what allows the processor to only grab "new" rows, as it will keep track of the maximum value it has seen so far for the column, and the next time it runs, it will fetch only those rows whose value is greater than the last max it saw.
If you are not setting a Maximum-Value Column, then QueryDatabaseTable acts much like ExecuteSQL, in the sense that it will keep repeating the same query and thus give duplicate rows. So for your use case I recommend setting a Maximum Value Column for that processor. If there is no such field in the table, then you're really looking at more of a "run-once" scenario, which is not currently supported.
Created 03-01-2017 06:24 PM
QueryDatabaseTable is usually used for "incremental" fetching, meaning it will only grab "new" rows. This is based on the "Maximum Value Columns" property, which is usually set to an ID or timestamp field in the database. That is what allows the processor to only grab "new" rows, as it will keep track of the maximum value it has seen so far for the column, and the next time it runs, it will fetch only those rows whose value is greater than the last max it saw.
If you are not setting a Maximum-Value Column, then QueryDatabaseTable acts much like ExecuteSQL, in the sense that it will keep repeating the same query and thus give duplicate rows. So for your use case I recommend setting a Maximum Value Column for that processor. If there is no such field in the table, then you're really looking at more of a "run-once" scenario, which is not currently supported.
Created 03-01-2017 06:25 PM
Created 03-01-2017 10:19 PM
@Matt Thank you it was very helpful.