Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can I access event provenance metadata using expression language?

avatar

Events in my flow have the pictured metadata. Can I access these provenance fields as part of the flowfile using expression language?

6686-screen-shot-2016-08-16-at-100925-am.png

1 ACCEPTED SOLUTION

avatar
Master Guru

Provenance fields are not available in Expression Language per se. Only the flow file's attributes (including the core attributes, some of which are pictured) are available to Expression Language (EL). The Provenance Event metadata is retrieved by a separate query, which is likely too expensive to be done for Expression Language evaluation. Also if EL is used in a processor, which provenance event would be bound to it? The event(s) generated by the processor acting on the flow file have likely not yet been generated, and getting the previous event could be difficult (and slow).

Having said that, the above information is certainly available to data flows. IMO the best way to get the provenance event data is to use the SiteToSiteProvenanceReportingTask to send provenance events over Site-to-Site to your flow. There you can parse the events (they are in JSON), filter for the flow file UUID if you want, and/or extract into attributes (using EvaluateJsonPath for example) fields such as Transit Uri.

View solution in original post

7 REPLIES 7

avatar
Master Guru

Provenance fields are not available in Expression Language per se. Only the flow file's attributes (including the core attributes, some of which are pictured) are available to Expression Language (EL). The Provenance Event metadata is retrieved by a separate query, which is likely too expensive to be done for Expression Language evaluation. Also if EL is used in a processor, which provenance event would be bound to it? The event(s) generated by the processor acting on the flow file have likely not yet been generated, and getting the previous event could be difficult (and slow).

Having said that, the above information is certainly available to data flows. IMO the best way to get the provenance event data is to use the SiteToSiteProvenanceReportingTask to send provenance events over Site-to-Site to your flow. There you can parse the events (they are in JSON), filter for the flow file UUID if you want, and/or extract into attributes (using EvaluateJsonPath for example) fields such as Transit Uri.

avatar

More details would probably help.

I'm using minifi-cpp to produce data and push to NiFi via Site2Site. minifi-cpp does not give me the ability to include additional metadata in the flowfiles. Specifically, I need to access Transit Uri so I can extract the domain name and use it to route in my flow.

I believe I'm pushing too many events through Site2Site to try issuing a Provenance query for every flowfile.

So you're suggesting export Provenance records in bulk via the reporting task. Then downstream (say, in Spark), I join my raw flowfiles up to their provenance metadata on FlowFile Uuid?

avatar
Master Guru

I don't have much experience with the site-to-site implementation, but seems like it wouldn't be too difficult to support adding the transit.uri as an attribute when receiving flow files over site-to-site (if thats all we are talking about):

https://github.com/apache/nifi/blob/e23b2356172e128086585fe2c425523c3628d0e7/nifi-nar-bundles/nifi-f...

Alternatively, maybe minifi-cpp should have the ability to send metadata since NiFi already supports receiving attributes over site-to-site.

avatar

That attribute would be a great addition to Site2Site traffic. Today the upstream flow is responsible for including attributes specifying data origin. Downstream should be able to access data origin regardless of how the upstream flow is configured.

avatar
Master Guru

What do you think about having it automatically add two attributes like "remote.host" and "remote.address" where remote.host has just the hostname and remote.address has hostname:port?

avatar

@Bryan Bende sounds better to me than parsing the Transit Uri 😃

avatar
Master Guru