Extract the contents of the download, we will refer to this instance as the "first NiFi instance"
Copy the exatracted contents to a new directory, we will refer to this as the "Provenance Reporting Instance"
Configuring the Provenance Reporting Instance NiFi
Before starting up the this NiFi instane we need to enable Site-to-Site communication so that it can receive the provenance data and also change the listening port for NiFi so ti does not conflict with the first instance. To do that do the following:
Open <$PROVENANCEREPORTINGINSTANCENIFIINSTALL_DIR>/conf/nifi.properties in your favorite editor
We now have the two NiFi instances ready, to start them do the following:
Navigate to the directory for the first NiFi instance and start it according to your operting system
Navigate to the directory for the Provenance Reporting Instance and start it according to your operting system
Setting up the first Flow for NiFi
Now that we have two NiFi instances up and running the next thing to do is to create our data flow. To keep things simple we are going to use one of the sample NiFi Dataflow Templates. In particular we are going to use the DateConversion flow which can be downloaded from here
After downloading this template, import it, and then create an instance of it. For instructions on how to import a template please see the Template section of the NiFi user guide. After creating an instance of the template your NiFi canvas should then look similar to this:
Figure 1. DateConversion Flow
I have modified the layout on my canvas so it easily fits on the screen.
Connect the input port to the LogAttribute Processor
Start the ProvData input port
Your flow should look similar to the following:
Figure 2. Provenance Reporting Instance Flow
Adding Site To Site Provenance Reporting
We are now ready to add the provenance reporting task the NiFi flow. To do this do the following:
Go to the "hamburger menu" in the top right of the UI and chose "Controller Settings"
Go the "Reporting Tasks" tab and click the icon
Chose the SiteToSiteProvenanceReportingTask
Click on the pencil icon and edit the SiteToSiteProvenanceReportingTask properties so it looks like this:
NOTE: I set the batch size to 1, this is for demo purposes only. In a production environment you would want to adjust this or leave it as the default 1000.
Adjust the settings for the SiteToSiteProvenanceReportingTask so that the run schedule is 5 seconds and not the default 5 minutes.
NOTE: Again this is for demo purposes only. In a production environment you may want to leave this as the default or adjust it accordingly.
Starting the flow
We are now all ready to start the DateConversion flow we created before. Go ahead and just click on the start button on the operate palette.
Inspecting the Provenance data
To inspect the provenance data, go the Provenance Reporting Instance instance (http://127.0.0.1:8088/nifi). With the LogAttribute processor stopped, you should see the flow files build up in the queue between the input port and the LogAttribute processor.
To view the provenenace data do the following:
Right click on the queue and chose "List queue"
Pick one of the flow files in the queue
Chose "View" to see the content, an example of a formatted provenance event looks like this: