Support Questions

Find answers, ask questions, and share your expertise

Find the Nifi Lineage Duration programatically for each processor in a process group

avatar
Contributor

I am trying to find the amount of time that each processor in my data pipeline takes to process the data. I am trying to find this by either reading the nifi-app.log / rest-api calls. Is there a way to find the duration that each processor took to execute the incoming flow files and volume of data it processed ?

3 REPLIES 3

avatar

Hi @Tanmoy

If you are referring to the lineage duration in NiFi provenance UI (see below pic) then you can use SiteToSiteProvenanceReportingTask to send provenance data to a NiFi cluster. Once NiFi receives it, it can store it it where ever you want (File, index in Solr, Database, etc).

42401-screen-shot-2017-11-01-at-125035-pm.png

To test it, go to hamburger top-right menu, controller settings, reporting tasks and add a S2SProvenanceRT. Configure it to send data to the same cluster like below:

42400-screen-shot-2017-11-01-at-124305-pm.png

Notice the Input Port Name attribute. I called it prov so I need to add an input port add to NiFi flow. Then, from this input port I'll decide what to do with provenance data (store it somewhere).

Data you will be receiving looks like this:

{
  "eventId": "5e5acd4a-46e5-4bb7-b957-22b89b7c4bb5",
  "eventOrdinal": 67,
  "eventType": "ATTRIBUTES_MODIFIED",
  "timestampMillis": 1509535134626,
  "timestamp": "2017-11-01T11:18:54.626Z",
  "durationMillis": -1,
  "lineageStart": 1509535134620,
  "componentId": "3829068d-015f-1000-b540-41c80254f8c7",
  "componentType": "UpdateAttribute",
  "componentName": "UpdateAttribute",
  "entityId": "e72bed81-0c26-4dc8-85e2-b4ac7940fcfe",
  "entityType": "org.apache.nifi.flowfile.FlowFile",
  "entitySize": 258,
  "previousEntitySize": 258,
  "updatedAttributes": {
   "project_id": "project_1"
  },
  "previousAttributes": {
   "path": "./",
   "uuid": "e72bed81-0c26-4dc8-85e2-b4ac7940fcfe",
   "filename": "721201919566618"
  },
  "actorHostname": "abdelkrjidjmbp2",
  "contentURI": "http://abdelkrjidjmbp2:8080/nifi-api/provenance-events/67/content/output",
  "previousContentURI": "http://abdelkrjidjmbp2:8080/nifi-api/provenance-events/67/content/input",
  "parentIds": [],
  "childIds": [],
  "platform": "nifi",
  "application": "NiFi Flow"
 },

You can see the name of the processor (UpdateAttribute), its type (UpdateAttribute), its ID (3829068d-015f-1000-b540-41c80254f8c7) and the flow file ID (5e5acd4a-46e5-4bb7-b957-22b89b7c4bb5). To get the Provenance duration you need to do timestampMillis - lineageStart. In my example it's 1509535134626 - 1509535134620 which is 6 ms like in the first screenshot


screen-shot-2017-11-01-at-123414-pm.png

avatar
New Contributor

Then what does nifi_average_lineage_duration do?

avatar
Community Manager

@shekabhi, as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: