We have a need to archive provenance data indefinitely. a simple file based archival to hdfs would meet our needs.
From reading questions on this site, I know we can use the nifi site-to-site reporting task to send provenance events as flowfiles via nifif site-to-site. The obvious but probably wrong solution would be to point the reporting task to its own nifi cluster, catch the flowfiles and do a mergcontent -> puthdfs. This is probably wrong because this flow itself would generate more provenance events.. which would generate more provenance events.... forever.
I'd really like to avoid the administrative burden of running another nifi instance, even a minifi instance.
Has anyone come up with a good solution for archiving provenance without using another nifi cluster?
I am going to attempt using the Component ID to Exclude property of SiteToSiteProvenanceReportingTask and just list every uuid of processors in my flow that writes the events to hdfs. It will just be a hassle to update all the uuids if I end up with a big flow.