- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Nifi DatabaseTableSchemaRegistry - PutDatabaseRecord Errors when target table structure is altered
- Labels:
-
Apache NiFi
Created on 02-27-2025 01:31 PM - edited 02-27-2025 01:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
My one liner question:
How can we ensure that the DatabaseTableSchemaRegistry recognizes changes in the table structure and updates the corresponding table schema definition?
Details:
Our PutDatabaseRecord processors' readers are configured to use the DatabaseTableSchemaRegistry to parse incoming JSON. We have observed multiple instances where the NiFi flow fails when the target table structure is altered, such as when columns are added or removed.
When new columns are added to the target table, the PutDatabaseRecord reader does not recognize the change and continues to create insert statements without the new columns. We are unsure what may have resolved the issue, but in some instances, it was automatically resolved after a day. We suspect there may be an internal cache that is rebuilt periodically. We have no clear solution for this issue and would greatly appreciate any assistance regarding it.
Our NiFi system is a 3-node cluster (version 2.1) that uses Kubernetes-based leader election and state management, without an external ZooKeeper.
Thanks
Created 02-28-2025 10:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@shiva239
The Schemas are fetched when a FlowFile is processed by the PutDatabaseRecord processor. There is no option to schedule a refresh of the existing cache.
There is no rest-api endpoint or options available to flush the cache on demand aside from maybe stopping and starting the processor. But doing so means every schema will be cached again as new FlowFiles are processed by the PutDatabaseRecord processor, so not an ideal solution/work-around.
The issue you are having is related with an existing Apache NiFi NIFI-12027 PutDatabaseRecord improvement jira. I suggest you add a comment to this jira explaining your use case and impact this has. Perhaps someone in the community or yourself can contribute to this improvement.
Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 02-28-2025 06:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@shiva239
The PutDatabaseRecord processor has a Table Schema Cache Size property that Specifies how many Table Schemas should be cached. This cache is used to improve performance. you could try setting this to 0 from its default 100, but am not sure how this will impact your specific overall performance.
Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created on 02-28-2025 08:52 AM - edited 02-28-2025 08:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @MattWho
Thanks for info about PutDatabaseRecord cache. We would like to keep PutDatabaseRecord Table schema cache to reduce DB hits.
Is there any option in Nifi to force refresh this Table Schema cache?
Created 02-28-2025 10:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@shiva239
The Schemas are fetched when a FlowFile is processed by the PutDatabaseRecord processor. There is no option to schedule a refresh of the existing cache.
There is no rest-api endpoint or options available to flush the cache on demand aside from maybe stopping and starting the processor. But doing so means every schema will be cached again as new FlowFiles are processed by the PutDatabaseRecord processor, so not an ideal solution/work-around.
The issue you are having is related with an existing Apache NiFi NIFI-12027 PutDatabaseRecord improvement jira. I suggest you add a comment to this jira explaining your use case and impact this has. Perhaps someone in the community or yourself can contribute to this improvement.
Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 02-28-2025 02:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you @MattWho for details.
As you mentioned, I will post my usecase in the jira. Thanks for your help!
