Created 10-22-2020 09:14 AM
HDF 3.4.1 NIFI 1.9 - NIFI Provenance Repository filling disc 500GB
I have a requirement to retain provenance for 5 days and made necessary changes
- provenance retaining hardly - 2 days and less
content_repo - 500GB utilization 10%
Provenence_repo - 500GB Utilization 98%
flowfile_repo - 500GB utilization 10%
below are the configs
@MattWho @TimothySpann Please advice
Created 10-22-2020 11:12 AM
https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
Maybe shrink your provenance repo and send everything you need via:
https://www.datainmotion.dev/2020/04/sql-reporting-task-for-cloudera-flow.html
Or via Atlas
Created 10-22-2020 09:23 AM
you need to restart your servers after making the setting changes. do you have space?
Created 10-22-2020 09:32 AM
@TimothySpann thanks for the update. restarted the cluster 2 weeks back after making changes.
but still, the Provenance repo is piling up. surprised to see that huge disc being filled by provenance
Created 10-22-2020 09:47 AM
provenance is for every step of every flowfile in every flow.
I recommend rolling those logs
Created 10-22-2020 10:56 AM
Hi Tim,
Are you recommending to role nifi provenance? could you provide more pointers?
Created 10-22-2020 11:12 AM
https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
Maybe shrink your provenance repo and send everything you need via:
https://www.datainmotion.dev/2020/04/sql-reporting-task-for-cloudera-flow.html
Or via Atlas
Created 10-22-2020 11:18 AM
Thanks, Tim, my whole Idea is, developers should be able to replay the message from the provenance for at least 5 days as per the requirements
I'm assuming the only solution is BUMP up the provenance storage to achieve replay capability.
please let me know your thoughts!.