- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 12-19-2016 09:30 PM
Extracting NiFi Provenance Data using SiteToSiteProvenanceReportingTask Part 1
In this tutorial, we will learn how configure NiFi to send provenance data to a second NiFi instance:
- Downloading NiFi
- Configure Site-To-Site
- Setting up the first Flow
- Setting up the Provenance Reporting Instance Flow
- Adding Site To Site Provenance Reporting
- Starting the flow
- Inspecting the Provenance data
- Next Steps
NOTE: In this tutorial we are going to be taking the following shortcuts, in the spirit of understanding the concepts. Specifically we are going to:
- Run two NiFi instances on the same host.
- Not configure security in either NiFi instance or use a secure transport between NiFi hosts.
These are not the best practices that would be recommended in a production environment.
References
For a primer on NiFi please refer to the NiFi Getting Started Guide.
Downloading and Configuring NiFi
- Downlaod the latest version of NiFi from here
- Extract the contents of the download, we will refer to this instance as the "first NiFi instance"
- Copy the exatracted contents to a new directory, we will refer to this as the "Provenance Reporting Instance"
Configuring the Provenance Reporting Instance NiFi
Before starting up the this NiFi instane we need to enable Site-to-Site communication so that it can receive the provenance data and also change the listening port for NiFi so ti does not conflict with the first instance. To do that do the following:
- Open <$PROVENANCEREPORTINGINSTANCENIFIINSTALL_DIR>/conf/nifi.properties in your favorite editor
-
Change:
nifi.remote.input.host= nifi.remote.input.socket.port= nifi.remote.input.secure=true nifi.web.http.port=8080
To
nifi.remote.input.host=localhost nifi.remote.input.socket.port=10000 nifi.remote.input.secure=false nifi.web.http.port=8088
Starting up the both instances of NiFi
We now have the two NiFi instances ready, to start them do the following:
- Navigate to the directory for the first NiFi instance and start it according to your operting system
- Navigate to the directory for the Provenance Reporting Instance and start it according to your operting system
Setting up the first Flow for NiFi
Now that we have two NiFi instances up and running the next thing to do is to create our data flow. To keep things simple we are going to use one of the sample NiFi Dataflow Templates. In particular we are going to use the DateConversion flow which can be downloaded from here
After downloading this template, import it, and then create an instance of it. For instructions on how to import a template please see the Template section of the NiFi user guide. After creating an instance of the template your NiFi canvas should then look similar to this:
Figure 1. DateConversion Flow
I have modified the layout on my canvas so it easily fits on the screen.
Setting up the Provenance Reporting Instance Flow
- Open a browser and go to the "Provenance Reporting Instance" instance: http://127.0.0.1:8088/nifi
- Create an input port called "Prov Data"
- Create a LogAttribute Processor
- Connect the input port to the LogAttribute Processor
- Start the ProvData input port
Your flow should look similar to the following:
Figure 2. Provenance Reporting Instance Flow
Adding Site To Site Provenance Reporting
We are now ready to add the provenance reporting task the NiFi flow. To do this do the following:
-
Go to the "hamburger menu" in the top right of the UI and chose "Controller Settings"
-
Go the "Reporting Tasks" tab and click the icon
-
Chose the SiteToSiteProvenanceReportingTask
-
Click on the pencil icon and edit the SiteToSiteProvenanceReportingTask properties so it looks like this:
NOTE: I set the batch size to 1, this is for demo purposes only. In a production environment you would want to adjust this or leave it as the default 1000.
-
Adjust the settings for the SiteToSiteProvenanceReportingTask so that the run schedule is 5 seconds and not the default 5 minutes.
NOTE: Again this is for demo purposes only. In a production environment you may want to leave this as the default or adjust it accordingly.
Starting the flow
We are now all ready to start the DateConversion flow we created before. Go ahead and just click on the start button on the operate palette.
Inspecting the Provenance data
To inspect the provenance data, go the Provenance Reporting Instance instance (http://127.0.0.1:8088/nifi). With the LogAttribute processor stopped, you should see the flow files build up in the queue between the input port and the LogAttribute processor.
To view the provenenace data do the following:
- Right click on the queue and chose "List queue"
- Pick one of the flow files in the queue
- Chose "View" to see the content, an example of a formatted provenance event looks like this:
<code>[{ "eventId": "07b4693a-20b1-4a4d-9dc3-37d4c8f93e59", "eventOrdinal": 0, "eventType": "CREATE", "timestampMillis": 1482171900667, "timestamp": "2016-12-19T18:25:00.667Z", "durationMillis": -1, "lineageStart": 1482171900657, "componentId": "3fde726d-5cc1-4bb6-9e06-35218a9c58a8", "componentType": "GenerateFlowFile", "componentName": "GenerateFlowFile", "entityId": "47160cde-d484-4292-be3d-476cd4fff1cb", "entityType": "org.apache.nifi.flowfile.FlowFile", "entitySize": 1024, "updatedAttributes": { "path": "./", "uuid": "47160cde-d484-4292-be3d-476cd4fff1cb", "filename": "19180888360764" }, "previousAttributes": {}, "actorHostname": "hw13095.attlocal.net", "contentURI": "http://hw13095.attlocal.net:8080/nifi-api/provenance-events/0/content/output", "previousContentURI": "http://hw13095.attlocal.net:8080/nifi-api/provenance-events/0/content/input", "parentIds": [], "childIds": [], "platform": "nifi", "application": "NiFi Flow" }]
Next Steps
Now that you have data flowing to your Provenenace Reporting NiFi instance, you can take that JSON data and send it to any number of destinations to do further analysis on it.
Created on 02-27-2017 04:50 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Excellent post @apsaltis, thank you;
Do you have any plans for Part 2 ?
Created on 02-27-2017 05:30 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
@apsaltis If you want to capture provenance data from a NiFi Cluster, would you please elaborate what would be different in the setup; thanks in advance.
Created on 06-17-2019 11:37 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Is this guide still valid for the latest version (1.9.2) I followed everything exactly and cannot get it to work. Is there any other assistance available for exporting the provenance data?
Created on 07-18-2019 07:43 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Hi, I followed teh steps as mentioned above but i am not seeing any data in 8080 port. Kindly advise me if i am doing wrong here.
Created on 04-10-2020 03:39 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
To all the members who are asking if it is still valid, the answer is yes this seems to be still valid.
I have extracted provenance data by connecting to same instance of NiFi rather than having multiple NiFi instance.
Created on 02-12-2021 02:02 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Hello
I tried this with nifi 1.12.1 and I get nothing on Provenance Reporting Instance input port.If I use a 'remote process group' to send data to Provenance Reporting Instance input port it communicates well.Look like the Reporting-Task/SiteToSiteProvenanceTask just sits there and does nothing.
No error anywhere
Some clue on what is wrong? I really need to export the provenances to external systems...
thanks
best regards
Gilles
Created on 02-23-2021 03:30 AM - edited 02-23-2021 04:17 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
@gilou3000 have you found any solution to this? I am facing the same problem with nifi 1.12.1
Created on 02-23-2021 07:31 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
I have used this solution for nifi 1.12.1 and works right. I messed up some part while doing the tutorial.
Created on 09-19-2021 05:05 AM - edited 09-19-2021 05:26 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
This article could be improved if settings for the emitting and receving nifi instances could be more clearly identified.
Created on 05-28-2024 03:18 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Could someone please help me with this ?
Fetch Provenance data using SiteToSiteProvenanceRe... - Cloudera Community - 388418
configuration site to site is not working in http when nifi is running on https