Support Questions

Find answers, ask questions, and share your expertise

Fetch LAST MODIFIED data using NiFi

avatar
Contributor

Hi,

I am trying to look at different processor but somehow couldn't find the best match to my Use Case.

My Use Case is I want to fetch the table from the Oracle SQL and put this data into Hive. And I want to run a customized SQL query against the OracleDB and fetched the data but it is fetching the complete data but I want just the LAST MODIFIED data only. For This I am using GenerateTableFetch-- > ExecuteSQL --> PutHiveStreaming. But Execute SQL is pulling the complete data from OracleDB but not just the LAST MODIFIED. Is there any way where I am doing wrong ? My idea is to ExecuteSQL --> SplitAvro --> PutHiveStreaming not sure if this works or not.

1 REPLY 1

avatar
Master Guru

You will want to set whatever column has your "LAST MODIFIED" values as the Maximum Value Column in GenerateTableFetch. The first time it will still generate SQL to pull the complete data, but it will also keep track of the maximum observed value from your Maximum Value Column. The next time GenerateTableFetch runs, it will only generate SQL to fetch the rows whose value for LAST MODIFIED is greater than the last observed maximum.

If you want the first generation to start at a particular value (for the Maximum Value Column), you can add a user-defined property called "initial.maxvalue.<maxvaluecol>", where "<maxvaluecol>" is the name of the column you specified as the Maximum Value Column. This allows you to "skip ahead", and from then on GenerateTableFetch will continue in normal operation, keeping track of the current maximum and only generating SQL to fetch rows whose values are larger than the current max.

If you need a custom query (or, more correctly, you want to add a custom WHERE clause), you can do that by setting the Custom WHERE Clause property of GenerateTableFetch. If you need completely arbitrary queries, then in NiFi 1.7.0 (via NIFI-1706) you can use QueryDatabaseTable to provide arbitrary queries. This capability does not exist for GenerateTableFetch, but we can investigate adding it as an improvement, please feel free to file a Jira for this.