- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Find the Nifi Lineage Duration programatically for each processor in a process group
- Labels:
-
Apache NiFi
Created 11-01-2017 09:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to find the amount of time that each processor in my data pipeline takes to process the data. I am trying to find this by either reading the nifi-app.log / rest-api calls. Is there a way to find the duration that each processor took to execute the incoming flow files and volume of data it processed ?
Created on 11-01-2017 11:52 AM - edited 08-18-2019 01:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Tanmoy
If you are referring to the lineage duration in NiFi provenance UI (see below pic) then you can use SiteToSiteProvenanceReportingTask to send provenance data to a NiFi cluster. Once NiFi receives it, it can store it it where ever you want (File, index in Solr, Database, etc).
To test it, go to hamburger top-right menu, controller settings, reporting tasks and add a S2SProvenanceRT. Configure it to send data to the same cluster like below:
Notice the Input Port Name attribute. I called it prov so I need to add an input port add to NiFi flow. Then, from this input port I'll decide what to do with provenance data (store it somewhere).
Data you will be receiving looks like this:
{ "eventId": "5e5acd4a-46e5-4bb7-b957-22b89b7c4bb5", "eventOrdinal": 67, "eventType": "ATTRIBUTES_MODIFIED", "timestampMillis": 1509535134626, "timestamp": "2017-11-01T11:18:54.626Z", "durationMillis": -1, "lineageStart": 1509535134620, "componentId": "3829068d-015f-1000-b540-41c80254f8c7", "componentType": "UpdateAttribute", "componentName": "UpdateAttribute", "entityId": "e72bed81-0c26-4dc8-85e2-b4ac7940fcfe", "entityType": "org.apache.nifi.flowfile.FlowFile", "entitySize": 258, "previousEntitySize": 258, "updatedAttributes": { "project_id": "project_1" }, "previousAttributes": { "path": "./", "uuid": "e72bed81-0c26-4dc8-85e2-b4ac7940fcfe", "filename": "721201919566618" }, "actorHostname": "abdelkrjidjmbp2", "contentURI": "http://abdelkrjidjmbp2:8080/nifi-api/provenance-events/67/content/output", "previousContentURI": "http://abdelkrjidjmbp2:8080/nifi-api/provenance-events/67/content/input", "parentIds": [], "childIds": [], "platform": "nifi", "application": "NiFi Flow" },
You can see the name of the processor (UpdateAttribute), its type (UpdateAttribute), its ID (3829068d-015f-1000-b540-41c80254f8c7) and the flow file ID (5e5acd4a-46e5-4bb7-b957-22b89b7c4bb5). To get the Provenance duration you need to do timestampMillis - lineageStart. In my example it's 1509535134626 - 1509535134620 which is 6 ms like in the first screenshot
Created 01-05-2023 09:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Then what does nifi_average_lineage_duration do?
Created 01-06-2023 01:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@shekabhi, as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.
Regards,
Vidya Sargur,Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
