Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Is there a practical way to indefinitely archive nifi provenance without a second nifi cluster?

Is there a practical way to indefinitely archive nifi provenance without a second nifi cluster?

We have a need to archive provenance data indefinitely. a simple file based archival to hdfs would meet our needs.

From reading questions on this site, I know we can use the nifi site-to-site reporting task to send provenance events as flowfiles via nifif site-to-site. The obvious but probably wrong solution would be to point the reporting task to its own nifi cluster, catch the flowfiles and do a mergcontent -> puthdfs. This is probably wrong because this flow itself would generate more provenance events.. which would generate more provenance events.... forever.

I'd really like to avoid the administrative burden of running another nifi instance, even a minifi instance.

Has anyone come up with a good solution for archiving provenance without using another nifi cluster?

1 REPLY 1
Highlighted

Re: Is there a practical way to indefinitely archive nifi provenance without a second nifi cluster?

I am going to attempt using the Component ID to Exclude property of SiteToSiteProvenanceReportingTask and just list every uuid of processors in my flow that writes the events to hdfs. It will just be a hassle to update all the uuids if I end up with a big flow.

Don't have an account?
Coming from Hortonworks? Activate your account here