Support Questions

Find answers, ask questions, and share your expertise

Data Provenance Storage in Apache NiFi

avatar
New Contributor

Hello everyone,

I have a question regarding to data provenance. I did read NiFi document about provenance repository that can store the provenance data, but I want to this provenance data in database for long-term. How can I do that in NiFi 2.2.0?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@0tto 

Welcome to the community.

NiFi backend does not provide the ability to configure the Provenance Repository to store provenance events in an external DB which can then be accessed via the Provenance UI integration.  However, there are couple provenance Reporting tasks available within NiFi that can be used to additionally send provenance event (local provenance still exists) to another destination.  

For sending the provenance events to a DB, building a dataflow on a dedicated NiFi instance via the SiteToSiteProvenanceReportingTask is going to be the option for you.

So you would add this reporting task to the NiFi instance/cluster generating the provenance events you want to keep for long term storage.  You would setup another NiFi instance/cluster for processing the large volume of provenance events.  The Reporting task would be configured to send the provenance events to a Remote Input Port the other NiFi via NiFi's Site-To-Site capability.

Once these provenance events are received by that other NiFi they will become content of FlowFile which you can route via a NiFi dataflow on the canvas and send them to whatever storage destination of your choice.

Note: You typically do not want to use the same NiFi where you are using the reporting task to receive and process the provenance events because those received events also will produce provenance events as they are routed through the dataflow, so you would have endless provenance events being produced.  Sending to an dedicated provenance NiFi instance/cluster makes sure that your DB contains only the dataflow(s) provenance events of interest.

Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

View solution in original post

2 REPLIES 2

avatar
Master Mentor

@0tto 

Welcome to the community.

NiFi backend does not provide the ability to configure the Provenance Repository to store provenance events in an external DB which can then be accessed via the Provenance UI integration.  However, there are couple provenance Reporting tasks available within NiFi that can be used to additionally send provenance event (local provenance still exists) to another destination.  

For sending the provenance events to a DB, building a dataflow on a dedicated NiFi instance via the SiteToSiteProvenanceReportingTask is going to be the option for you.

So you would add this reporting task to the NiFi instance/cluster generating the provenance events you want to keep for long term storage.  You would setup another NiFi instance/cluster for processing the large volume of provenance events.  The Reporting task would be configured to send the provenance events to a Remote Input Port the other NiFi via NiFi's Site-To-Site capability.

Once these provenance events are received by that other NiFi they will become content of FlowFile which you can route via a NiFi dataflow on the canvas and send them to whatever storage destination of your choice.

Note: You typically do not want to use the same NiFi where you are using the reporting task to receive and process the provenance events because those received events also will produce provenance events as they are routed through the dataflow, so you would have endless provenance events being produced.  Sending to an dedicated provenance NiFi instance/cluster makes sure that your DB contains only the dataflow(s) provenance events of interest.

Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

avatar
New Contributor

Thank you for the detailed answer.