Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Nifi Provenance Timestamp

avatar

When looking in the lineage of a flowfile in provenance, there is a Time field for each node in the lineage

Looks of the format...

Time

06/13/2017 14:18:47.678 EDT

When is this field set with regards to the processors task?

I ask because I am trying to track the time each processor is taking, and the event duration is not always set, so I am simply taking the timestamps of each processor and taking the difference between them. To date I've assumed the Time is taken as the flowfile 'enters' the processor so diffing time between node 2 and node 1 gives the time taken for node 1 to handle the flowfile, but I'd was wondering if someone can confirm this.

1 ACCEPTED SOLUTION

avatar
Master Guru

There should be two time fields, event time and event duration...

Event time is the time at which the event was generated and usually the event is generated at the end of a processor executing, after it has successfully processed the flow file and is ready to report an event about what it happened.

For two provenance events you could take the difference between the event times to see how long it took between those events, but it doesn't guarantee all that time was spent in a processor. Lets say processor A emits a flow which produces a CREATE event, and then processor B writes to it which produces a CONTENT_MODIFIED event. The flow file could have sat in the queue between these two processors for several minutes due to back-pressure or some reason, and was then processed by processor B in a second or two, but the time difference between those two events would be several minutes.

Event duration is not guaranteed to be set and is dependent on the processor. Typically a processor will calculate the time it took to perform some operation, for example transferring the content of a flow file to an external system, and then report a provenance event with that duration in it, for example a SEND event.

View solution in original post

2 REPLIES 2

avatar
Master Guru

There should be two time fields, event time and event duration...

Event time is the time at which the event was generated and usually the event is generated at the end of a processor executing, after it has successfully processed the flow file and is ready to report an event about what it happened.

For two provenance events you could take the difference between the event times to see how long it took between those events, but it doesn't guarantee all that time was spent in a processor. Lets say processor A emits a flow which produces a CREATE event, and then processor B writes to it which produces a CONTENT_MODIFIED event. The flow file could have sat in the queue between these two processors for several minutes due to back-pressure or some reason, and was then processed by processor B in a second or two, but the time difference between those two events would be several minutes.

Event duration is not guaranteed to be set and is dependent on the processor. Typically a processor will calculate the time it took to perform some operation, for example transferring the content of a flow file to an external system, and then report a provenance event with that duration in it, for example a SEND event.

avatar

Thank you, this was very helpful. For my specific purposes and my specific scenario it would seem using the difference in time should be adequate. I will keep in mind your warnings regarding time in queue, which was also the reason I didn't want to use lineage duration either, as we have events sitting in front of a control rate.