Created on 12-19-2016 09:30 PM
In this tutorial, we will learn how configure NiFi to send provenance data to a second NiFi instance:
NOTE: In this tutorial we are going to be taking the following shortcuts, in the spirit of understanding the concepts. Specifically we are going to:
These are not the best practices that would be recommended in a production environment.
For a primer on NiFi please refer to the NiFi Getting Started Guide.
Before starting up the this NiFi instane we need to enable Site-to-Site communication so that it can receive the provenance data and also change the listening port for NiFi so ti does not conflict with the first instance. To do that do the following:
Change:
nifi.remote.input.host= nifi.remote.input.socket.port= nifi.remote.input.secure=true nifi.web.http.port=8080
To
nifi.remote.input.host=localhost nifi.remote.input.socket.port=10000 nifi.remote.input.secure=false nifi.web.http.port=8088
We now have the two NiFi instances ready, to start them do the following:
Now that we have two NiFi instances up and running the next thing to do is to create our data flow. To keep things simple we are going to use one of the sample NiFi Dataflow Templates. In particular we are going to use the DateConversion flow which can be downloaded from here
After downloading this template, import it, and then create an instance of it. For instructions on how to import a template please see the Template section of the NiFi user guide. After creating an instance of the template your NiFi canvas should then look similar to this:
Figure 1. DateConversion Flow
I have modified the layout on my canvas so it easily fits on the screen.
Your flow should look similar to the following:
Figure 2. Provenance Reporting Instance Flow
We are now ready to add the provenance reporting task the NiFi flow. To do this do the following:
Go to the "hamburger menu" in the top right of the UI and chose "Controller Settings"
Go the "Reporting Tasks" tab and click the icon
Chose the SiteToSiteProvenanceReportingTask
Click on the pencil icon and edit the SiteToSiteProvenanceReportingTask properties so it looks like this:
NOTE: I set the batch size to 1, this is for demo purposes only. In a production environment you would want to adjust this or leave it as the default 1000.
Adjust the settings for the SiteToSiteProvenanceReportingTask so that the run schedule is 5 seconds and not the default 5 minutes.
NOTE: Again this is for demo purposes only. In a production environment you may want to leave this as the default or adjust it accordingly.
We are now all ready to start the DateConversion flow we created before. Go ahead and just click on the start button on the operate palette.
To inspect the provenance data, go the Provenance Reporting Instance instance (http://127.0.0.1:8088/nifi). With the LogAttribute processor stopped, you should see the flow files build up in the queue between the input port and the LogAttribute processor.
To view the provenenace data do the following:
<code>[{ "eventId": "07b4693a-20b1-4a4d-9dc3-37d4c8f93e59", "eventOrdinal": 0, "eventType": "CREATE", "timestampMillis": 1482171900667, "timestamp": "2016-12-19T18:25:00.667Z", "durationMillis": -1, "lineageStart": 1482171900657, "componentId": "3fde726d-5cc1-4bb6-9e06-35218a9c58a8", "componentType": "GenerateFlowFile", "componentName": "GenerateFlowFile", "entityId": "47160cde-d484-4292-be3d-476cd4fff1cb", "entityType": "org.apache.nifi.flowfile.FlowFile", "entitySize": 1024, "updatedAttributes": { "path": "./", "uuid": "47160cde-d484-4292-be3d-476cd4fff1cb", "filename": "19180888360764" }, "previousAttributes": {}, "actorHostname": "hw13095.attlocal.net", "contentURI": "http://hw13095.attlocal.net:8080/nifi-api/provenance-events/0/content/output", "previousContentURI": "http://hw13095.attlocal.net:8080/nifi-api/provenance-events/0/content/input", "parentIds": [], "childIds": [], "platform": "nifi", "application": "NiFi Flow" }]
Now that you have data flowing to your Provenenace Reporting NiFi instance, you can take that JSON data and send it to any number of destinations to do further analysis on it.
Created on 02-27-2017 04:50 PM
Excellent post @apsaltis, thank you;
Do you have any plans for Part 2 ?
Created on 02-27-2017 05:30 PM
@apsaltis If you want to capture provenance data from a NiFi Cluster, would you please elaborate what would be different in the setup; thanks in advance.
Created on 06-17-2019 11:37 PM
Is this guide still valid for the latest version (1.9.2) I followed everything exactly and cannot get it to work. Is there any other assistance available for exporting the provenance data?
Created on 07-18-2019 07:43 AM
Hi, I followed teh steps as mentioned above but i am not seeing any data in 8080 port. Kindly advise me if i am doing wrong here.
Created on 04-10-2020 03:39 PM
To all the members who are asking if it is still valid, the answer is yes this seems to be still valid.
I have extracted provenance data by connecting to same instance of NiFi rather than having multiple NiFi instance.
Created on 02-12-2021 02:02 AM
Hello
I tried this with nifi 1.12.1 and I get nothing on Provenance Reporting Instance input port.If I use a 'remote process group' to send data to Provenance Reporting Instance input port it communicates well.Look like the Reporting-Task/SiteToSiteProvenanceTask just sits there and does nothing.
No error anywhere
Some clue on what is wrong? I really need to export the provenances to external systems...
thanks
best regards
Gilles
Created on 02-23-2021 03:30 AM - edited 02-23-2021 04:17 AM
@gilou3000 have you found any solution to this? I am facing the same problem with nifi 1.12.1
Created on 02-23-2021 07:31 PM
I have used this solution for nifi 1.12.1 and works right. I messed up some part while doing the tutorial.
Created on 09-19-2021 05:05 AM - edited 09-19-2021 05:26 AM
This article could be improved if settings for the emitting and receving nifi instances could be more clearly identified.
Created on 05-28-2024 03:18 AM
Could someone please help me with this ?
Fetch Provenance data using SiteToSiteProvenanceRe... - Cloudera Community - 388418
configuration site to site is not working in http when nifi is running on https