Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can I access event provenance metadata using expression language?

avatar

Events in my flow have the pictured metadata. Can I access these provenance fields as part of the flowfile using expression language?

6686-screen-shot-2016-08-16-at-100925-am.png

1 ACCEPTED SOLUTION

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
7 REPLIES 7

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

More details would probably help.

I'm using minifi-cpp to produce data and push to NiFi via Site2Site. minifi-cpp does not give me the ability to include additional metadata in the flowfiles. Specifically, I need to access Transit Uri so I can extract the domain name and use it to route in my flow.

I believe I'm pushing too many events through Site2Site to try issuing a Provenance query for every flowfile.

So you're suggesting export Provenance records in bulk via the reporting task. Then downstream (say, in Spark), I join my raw flowfiles up to their provenance metadata on FlowFile Uuid?

avatar
Master Guru

I don't have much experience with the site-to-site implementation, but seems like it wouldn't be too difficult to support adding the transit.uri as an attribute when receiving flow files over site-to-site (if thats all we are talking about):

https://github.com/apache/nifi/blob/e23b2356172e128086585fe2c425523c3628d0e7/nifi-nar-bundles/nifi-f...

Alternatively, maybe minifi-cpp should have the ability to send metadata since NiFi already supports receiving attributes over site-to-site.

avatar

That attribute would be a great addition to Site2Site traffic. Today the upstream flow is responsible for including attributes specifying data origin. Downstream should be able to access data origin regardless of how the upstream flow is configured.

avatar
Master Guru

What do you think about having it automatically add two attributes like "remote.host" and "remote.address" where remote.host has just the hostname and remote.address has hostname:port?

avatar

@Bryan Bende sounds better to me than parsing the Transit Uri 😃

avatar
Master Guru