Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

How to extract NiFi provenance?

New Contributor

I would like to get all of the NiFi provenance data in my project and store it within a file using a custom NiFi processor, but I cannot find a working solution. Does anyone know how I can get this done? The code should be written in the custom processor's onTrigger method block.

1 ACCEPTED SOLUTION

Mentor

@Alexander Aolaritei

NiFi can produce a lot of provenance data. The solution you are looking for will be coming in Apache NiFi 1.0 in the form of a NiFi reporting Task. This "SiteToSiteProvenanceReportingTask" will use the NiFi Site-to-Site (S2S) protocol to send provenance events to another NiFi instance in configurable batches. Of course that target NIfI instance could be yourself; however, that would just produce even more provenance events locally as you handle those messages. So It may be wise to standup another NiFi instance just for Provenance event handling. Upon receiving those provenance events via a S2S input port, you can use standard NiFi processors to split/merge them, route them, and store them in your desired end point (Whether that is local file(s), external DB, etc...).

I am not a developer so cannot help with the custom solution you are working on, but just want to share what is coming as another viable solution to your needs.

Thanks,

Matt

View solution in original post

5 REPLIES 5

Mentor

@Alexander Aolaritei

NiFi can produce a lot of provenance data. The solution you are looking for will be coming in Apache NiFi 1.0 in the form of a NiFi reporting Task. This "SiteToSiteProvenanceReportingTask" will use the NiFi Site-to-Site (S2S) protocol to send provenance events to another NiFi instance in configurable batches. Of course that target NIfI instance could be yourself; however, that would just produce even more provenance events locally as you handle those messages. So It may be wise to standup another NiFi instance just for Provenance event handling. Upon receiving those provenance events via a S2S input port, you can use standard NiFi processors to split/merge them, route them, and store them in your desired end point (Whether that is local file(s), external DB, etc...).

I am not a developer so cannot help with the custom solution you are working on, but just want to share what is coming as another viable solution to your needs.

Thanks,

Matt

@mclark

can you give us an approximate month when NiFi 1.0 will be available to the community?

Thanks.

Mentor

NiFi 1.0 is deep in to development right now. Expect to see it up for vote in August. NiFi 1.0 has considerable re-work done across the board. (New UI, No more NCM for clustering, etc...) Very exciting stuff.

Explorer

@philg

Hello,

I would like to log all the data transformation done in my DF processor by processor.

Data provenance and SiteToSiteProvenanceReportingTask seems to be the right items to investigate, and also Nifi REST API ( provenance + provenance events)

but I do not know how to proceed for example how to call the REST API .. ( params are not so clear )

Any help ?

phil

best regards

Please don't post to old threads which are done, create a new question. I will lock this one now.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.