Created 06-14-2017 02:45 PM
When looking in the lineage of a flowfile in provenance, there is a Time field for each node in the lineage
Looks of the format...
Time
06/13/2017 14:18:47.678 EDT
When is this field set with regards to the processors task?
I ask because I am trying to track the time each processor is taking, and the event duration is not always set, so I am simply taking the timestamps of each processor and taking the difference between them. To date I've assumed the Time is taken as the flowfile 'enters' the processor so diffing time between node 2 and node 1 gives the time taken for node 1 to handle the flowfile, but I'd was wondering if someone can confirm this.
Created 06-14-2017 05:03 PM
There should be two time fields, event time and event duration...
Event time is the time at which the event was generated and usually the event is generated at the end of a processor executing, after it has successfully processed the flow file and is ready to report an event about what it happened.
For two provenance events you could take the difference between the event times to see how long it took between those events, but it doesn't guarantee all that time was spent in a processor. Lets say processor A emits a flow which produces a CREATE event, and then processor B writes to it which produces a CONTENT_MODIFIED event. The flow file could have sat in the queue between these two processors for several minutes due to back-pressure or some reason, and was then processed by processor B in a second or two, but the time difference between those two events would be several minutes.
Event duration is not guaranteed to be set and is dependent on the processor. Typically a processor will calculate the time it took to perform some operation, for example transferring the content of a flow file to an external system, and then report a provenance event with that duration in it, for example a SEND event.
Created 06-14-2017 05:03 PM
There should be two time fields, event time and event duration...
Event time is the time at which the event was generated and usually the event is generated at the end of a processor executing, after it has successfully processed the flow file and is ready to report an event about what it happened.
For two provenance events you could take the difference between the event times to see how long it took between those events, but it doesn't guarantee all that time was spent in a processor. Lets say processor A emits a flow which produces a CREATE event, and then processor B writes to it which produces a CONTENT_MODIFIED event. The flow file could have sat in the queue between these two processors for several minutes due to back-pressure or some reason, and was then processed by processor B in a second or two, but the time difference between those two events would be several minutes.
Event duration is not guaranteed to be set and is dependent on the processor. Typically a processor will calculate the time it took to perform some operation, for example transferring the content of a flow file to an external system, and then report a provenance event with that duration in it, for example a SEND event.
Created 06-14-2017 05:10 PM
Thank you, this was very helpful. For my specific purposes and my specific scenario it would seem using the difference in time should be adequate. I will keep in mind your warnings regarding time in queue, which was also the reason I didn't want to use lineage duration either, as we have events sitting in front of a control rate.