Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Nifi Provenance events for a specific flow

Nifi Provenance events for a specific flow

Expert Contributor

Hi

I want to extract provenance event data for a specific flow. I have nifi running locally on my laptop. There are a number of flows within my main flow. I am interested in extracting provenance information for a specific flow and ignore all others. After a bit of googling I know there is Reporting task and the provenance api. I couldnt find much information on the rest api any where.

I want to extract provenance events of a flow and write to file to process them later. Also do i also get the data along with the events? I want to track this information for a flow delivering streaming data to spark streaming. I might only need the metadata and not the data itself because saving the data might have a huge overhead for streaming jobs running over a long period of time.

I want to check an SLA requirement (say end to end delay). I would like to know what time did the data enter nifi and what time did it leave nifi. This could mean I ingest a file which is split into multiple records before i send it to spark streaming. then these records are processed in batches in spark. I track what batch took what time. I want to now extract information when did a record in a batch in spark streaming enter and leave nifi.

End to end time= (Nifi Exit time- Nifi Enter time) + spark batch processing time

also I might need lineage for back tracking.

Any help including pointing me to already existing links is greatly appriciated.

Thanks

1 REPLY 1

Re: Nifi Provenance events for a specific flow

New Contributor

Yes, you can extract provenance event for a specific flow file. For this you need to search provenance by "flowfileuuid". This can be done by logging in to NiFi UI -> click on menu in upper right hand corner ->select Data Provenance -> select search button -> enter the flowfile's uuid in "FlowFile UUID" text box -> Click search.

The same thing can be done via Rest api. To get to Rest api doc, in NiFi UI, click on menu in upper right hand corner and click Help option. When the help doc opens, browse to the bottom of the left pane window and select Rest Api under "Developer" section.

You can also access data along with the event by using rest apis GET /provenance-events/{id}/content/input and GET /provenance-events/{id}/content/output.

Don't have an account?
Coming from Hortonworks? Activate your account here