- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Nifi Provenance Timestamp
- Labels:
-
Apache NiFi
Created 06-14-2017 02:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When looking in the lineage of a flowfile in provenance, there is a Time field for each node in the lineage
Looks of the format...
Time
06/13/2017 14:18:47.678 EDT
When is this field set with regards to the processors task?
I ask because I am trying to track the time each processor is taking, and the event duration is not always set, so I am simply taking the timestamps of each processor and taking the difference between them. To date I've assumed the Time is taken as the flowfile 'enters' the processor so diffing time between node 2 and node 1 gives the time taken for node 1 to handle the flowfile, but I'd was wondering if someone can confirm this.
Created 06-14-2017 05:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There should be two time fields, event time and event duration...
Event time is the time at which the event was generated and usually the event is generated at the end of a processor executing, after it has successfully processed the flow file and is ready to report an event about what it happened.
For two provenance events you could take the difference between the event times to see how long it took between those events, but it doesn't guarantee all that time was spent in a processor. Lets say processor A emits a flow which produces a CREATE event, and then processor B writes to it which produces a CONTENT_MODIFIED event. The flow file could have sat in the queue between these two processors for several minutes due to back-pressure or some reason, and was then processed by processor B in a second or two, but the time difference between those two events would be several minutes.
Event duration is not guaranteed to be set and is dependent on the processor. Typically a processor will calculate the time it took to perform some operation, for example transferring the content of a flow file to an external system, and then report a provenance event with that duration in it, for example a SEND event.
Created 06-14-2017 05:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There should be two time fields, event time and event duration...
Event time is the time at which the event was generated and usually the event is generated at the end of a processor executing, after it has successfully processed the flow file and is ready to report an event about what it happened.
For two provenance events you could take the difference between the event times to see how long it took between those events, but it doesn't guarantee all that time was spent in a processor. Lets say processor A emits a flow which produces a CREATE event, and then processor B writes to it which produces a CONTENT_MODIFIED event. The flow file could have sat in the queue between these two processors for several minutes due to back-pressure or some reason, and was then processed by processor B in a second or two, but the time difference between those two events would be several minutes.
Event duration is not guaranteed to be set and is dependent on the processor. Typically a processor will calculate the time it took to perform some operation, for example transferring the content of a flow file to an external system, and then report a provenance event with that duration in it, for example a SEND event.
Created 06-14-2017 05:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, this was very helpful. For my specific purposes and my specific scenario it would seem using the difference in time should be adequate. I will keep in mind your warnings regarding time in queue, which was also the reason I didn't want to use lineage duration either, as we have events sitting in front of a control rate.
