- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Can I access event provenance metadata using expression language?
- Labels:
-
Apache NiFi
Created on ‎08-16-2016 02:10 PM - edited ‎08-19-2019 05:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Events in my flow have the pictured metadata. Can I access these provenance fields as part of the flowfile using expression language?
Created ‎08-16-2016 04:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Provenance fields are not available in Expression Language per se. Only the flow file's attributes (including the core attributes, some of which are pictured) are available to Expression Language (EL). The Provenance Event metadata is retrieved by a separate query, which is likely too expensive to be done for Expression Language evaluation. Also if EL is used in a processor, which provenance event would be bound to it? The event(s) generated by the processor acting on the flow file have likely not yet been generated, and getting the previous event could be difficult (and slow).
Having said that, the above information is certainly available to data flows. IMO the best way to get the provenance event data is to use the SiteToSiteProvenanceReportingTask to send provenance events over Site-to-Site to your flow. There you can parse the events (they are in JSON), filter for the flow file UUID if you want, and/or extract into attributes (using EvaluateJsonPath for example) fields such as Transit Uri.
Created ‎08-16-2016 04:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Provenance fields are not available in Expression Language per se. Only the flow file's attributes (including the core attributes, some of which are pictured) are available to Expression Language (EL). The Provenance Event metadata is retrieved by a separate query, which is likely too expensive to be done for Expression Language evaluation. Also if EL is used in a processor, which provenance event would be bound to it? The event(s) generated by the processor acting on the flow file have likely not yet been generated, and getting the previous event could be difficult (and slow).
Having said that, the above information is certainly available to data flows. IMO the best way to get the provenance event data is to use the SiteToSiteProvenanceReportingTask to send provenance events over Site-to-Site to your flow. There you can parse the events (they are in JSON), filter for the flow file UUID if you want, and/or extract into attributes (using EvaluateJsonPath for example) fields such as Transit Uri.
Created ‎08-16-2016 05:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
More details would probably help.
I'm using minifi-cpp to produce data and push to NiFi via Site2Site. minifi-cpp does not give me the ability to include additional metadata in the flowfiles. Specifically, I need to access Transit Uri so I can extract the domain name and use it to route in my flow.
I believe I'm pushing too many events through Site2Site to try issuing a Provenance query for every flowfile.
So you're suggesting export Provenance records in bulk via the reporting task. Then downstream (say, in Spark), I join my raw flowfiles up to their provenance metadata on FlowFile Uuid?
Created ‎08-16-2016 05:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't have much experience with the site-to-site implementation, but seems like it wouldn't be too difficult to support adding the transit.uri as an attribute when receiving flow files over site-to-site (if thats all we are talking about):
Alternatively, maybe minifi-cpp should have the ability to send metadata since NiFi already supports receiving attributes over site-to-site.
Created ‎08-16-2016 06:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That attribute would be a great addition to Site2Site traffic. Today the upstream flow is responsible for including attributes specifying data origin. Downstream should be able to access data origin regardless of how the upstream flow is configured.
Created ‎08-16-2016 06:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What do you think about having it automatically add two attributes like "remote.host" and "remote.address" where remote.host has just the hostname and remote.address has hostname:port?
Created ‎08-17-2016 01:39 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Bryan Bende sounds better to me than parsing the Transit Uri 😃
Created ‎08-17-2016 01:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Randy Gelhausen NiFi JIRA to capture this idea: https://issues.apache.org/jira/browse/NIFI-2585
