Created on 02-03-2017 04:56 PM - edited 08-17-2019 05:08 AM
I was looking for a way to easily forward and analyze provenance data that is available in nifi. There were a couple of options available.
option 1 is a very techy option , you could point you UI directly to the rest api and present a nice provenance visual with bulk replay capabilities. But, then it makes the developer responsible for keeping up with changes in the nifi rest api. It would be nice if we did not have direct dependency. Also, you might want to lockdown the rest api in production.
option 2 is very easy , but it is limited in where i can send those provenance events.
The apache nifi eng team resolved this situation with a ScriptedReportingTask controller service. It gives you an easy way of setting up the provenance reporting in Nifi and forwarding it to an end point of your choice. You also do not have a direct dependency between your application and nifi rest api. You can use ScriptedReportingTask to massage the events into a format that works with you application/endpoint.
I chose groovy as the language for my script, but there is options for python,javascript and a few others.
once you are logged in to nifi . Click the menu on the top right corner. Select controller settings option.
On the Controller setting dialog, choose the Reporting Tasks tab. Click the + on the top right corner to create a new reporting task.
On the Add reporting task dialog, search for ScriptedReportingTask. Double click on ScriptedReportingTask option in the results or select the row and click Add.
You will see a new ScriptedReportingTask in the reporting tasks list. Click on the pencil icon , to edit the reporting task.
You will see a reporting task window. Select groovy as the Script Engine choice and paste the script below in Script Body. Make sure to change the location of the file where your events will be written to.
import groovy.json.*; import org.apache.nifi.components.state.StateManager; import org.apache.nifi.reporting.ReportingContext; import org.apache.nifi.reporting.EventAccess; import org.apache.nifi.provenance.ProvenanceEventRepository; import org.apache.nifi.provenance.ProvenanceEventRecord; import org.apache.nifi.provenance.ProvenanceEventType; final StateManager stateManager = context.getStateManager(); final EventAccess access = context.getEventAccess(); final ProvenanceEventRepository provenance = access.getProvenanceRepository(); log.info("starting event id: " + Long.toString(1)); final List<ProvenanceEventRecord> events = provenance.getEvents(1, 100); log.info("ending event id: " + events.size()); def outFile = new File("/tmp/provenance.txt"); outFile.withWriter('UTF-8') { writer -> events.each{event -> writer.writeLine(new JsonBuilder(event).toPrettyString()) }}
Click ok and apply. Click on the "Play " Button to active the reporting task. I had set the scheduling frequency for the task on mine to 10 secs, so i could see the results right away. You can set it to a higher value as needed.
You should the logs appear in /tmp/provenance.txt , in json format. you could use other formats if needed and also may be not prettify for better performance.
The ScriptReportingTask is repsponsible for the ReportingContext , which is available to your scripts as the context object. You can log information to the nidi-log using the ComponentLog log object, which is also passed to you by the reporting task.
If you need anyother variables to be set in from the nifi task, you can define them as dynamic properties.
My script is very simple, it will look at 100 provenance events from the first provenance event. You can use the statemanager to keep track of the last provenance event that you received. You look at the implementation by @jfrazee to see how we can incrementally collect provenance events.
https://github.com/jfrazee/nifi-provenance-reporting-bundle
Thank you to @Matt Burgess for putting together this very useful reportintask component.
Hope this is useful.