I want to extract provenance event data for a specific flow. I have nifi running locally on my laptop. There are a number of flows within my main flow. I am interested in extracting provenance information for a specific flow and ignore all others. After a bit of googling I know there is Reporting task and the provenance api. I couldnt find much information on the rest api any where.
I want to extract provenance events of a flow and write to file to process them later. Also do i also get the data along with the events? I want to track this information for a flow delivering streaming data to spark streaming. I might only need the metadata and not the data itself because saving the data might have a huge overhead for streaming jobs running over a long period of time.
I want to check an SLA requirement (say end to end delay). I would like to know what time did the data enter nifi and what time did it leave nifi. This could mean I ingest a file which is split into multiple records before i send it to spark streaming. then these records are processed in batches in spark. I track what batch took what time. I want to now extract information when did a record in a batch in spark streaming enter and leave nifi.
End to end time= (Nifi Exit time- Nifi Enter time) + spark batch processing time
also I might need lineage for back tracking.
Any help including pointing me to already existing links is greatly appriciated.