Created 02-16-2017 09:48 PM
Hi guys,
Would appreciate your input on the following NiFi Provenance questions:
Thank you.
Created 02-21-2017 06:49 PM
1. The main intent of NiFi Provenance is for data governance. The ability to look back at the life of a FlowFile. It can tell you where a FlowFile originated, from what parent FlowFile it was part of, how many parents FlowFiles where used to create it, What changes were made to it, where it was sent, when it was terminated from NiFi, etc... NiFi Provenance also provides a means to view or replay FlowFile's that are no longer anywhere in your dataflow (Provided the FlowFiles content still exists in the content repositories archive) at any point in your dataflow.
Examples:
- Some downstream system expected to receive file "ABC" over the weekend from NiFi. You can use NiFi's data provenance to see exactly when file "ABC" was received by NiFi and exactly what NiFi did to file "ABC" as it traversed your dataflows.
- A FlowFile "XYZ" was expected to route through your dataflow to some destination "G". Upon searching Provenance it was discovered "XYZ" was routed down the wrong path. You could correct you dataflow routing issues and use data provenance to replay "XYZ" just prior to the dataflow correction.
2. NiFi's Provenance repository retains all Provenance events generated via your dataflow up until either retention time or max disk usage properties are met. When either of those conditions are met, the oldest provenance events are deleted first. There is no way to selectively decide which provenance events are retained in the repository. Using the
3. The Provenance API provides a means for running queries directly against the Provenance data stored local to a particular NiF instance. The SiteToSiteProvenanceReportingTask provides a way of sending provenance events to another system for perhaps longer term storage. Since provenance events do not contain any FlowFile content, only provenance events stored locally within a NiFi instance can be used to view or replay any content.
Thanks,
Matt
Created 02-21-2017 06:49 PM
1. The main intent of NiFi Provenance is for data governance. The ability to look back at the life of a FlowFile. It can tell you where a FlowFile originated, from what parent FlowFile it was part of, how many parents FlowFiles where used to create it, What changes were made to it, where it was sent, when it was terminated from NiFi, etc... NiFi Provenance also provides a means to view or replay FlowFile's that are no longer anywhere in your dataflow (Provided the FlowFiles content still exists in the content repositories archive) at any point in your dataflow.
Examples:
- Some downstream system expected to receive file "ABC" over the weekend from NiFi. You can use NiFi's data provenance to see exactly when file "ABC" was received by NiFi and exactly what NiFi did to file "ABC" as it traversed your dataflows.
- A FlowFile "XYZ" was expected to route through your dataflow to some destination "G". Upon searching Provenance it was discovered "XYZ" was routed down the wrong path. You could correct you dataflow routing issues and use data provenance to replay "XYZ" just prior to the dataflow correction.
2. NiFi's Provenance repository retains all Provenance events generated via your dataflow up until either retention time or max disk usage properties are met. When either of those conditions are met, the oldest provenance events are deleted first. There is no way to selectively decide which provenance events are retained in the repository. Using the
3. The Provenance API provides a means for running queries directly against the Provenance data stored local to a particular NiF instance. The SiteToSiteProvenanceReportingTask provides a way of sending provenance events to another system for perhaps longer term storage. Since provenance events do not contain any FlowFile content, only provenance events stored locally within a NiFi instance can be used to view or replay any content.
Thanks,
Matt
Created 02-28-2017 03:23 PM
Thanks @Matt Clarke