Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Master Guru

Using SiteToSiteProvenanceReportingTask to Send Provenance to Apache NiFi for Processing.

Eating our own provenance food!

It's almost comically easy to do this. You set up a task on the server you are reporting on that sends the data to your receiver. That other server you make a simple flow to ingest and process that. I stored it to HBase as JSON as it's a good place to put a lot of data fast.

56491-sitetositeprovenancereportingtasksetup.png

56492-sitetositereportingflowfilereceived.png

Send The Data

56490-configuringreportingtask.png

You need to create a SiteToSiteProvenanceReportingTask in Controller Settings - Reporting Tasks. It's pretty simple. Set the values above with your destination NiFi server and a port name that you have created already.

Receive the Data and Process

56489-provenancereportingingest.png

56493-provenanceprocessingflow.png

An Individual JSON Record

56494-provenanceasjson.png

Split the JSON into Records

$.[*]

56495-provenancesplitrecords.png

Save to HBase (PutHBaseJSON)

First I have to create a table.

hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2.2.6.2.0-205, r5210d2ed88d7e241646beab51e9ac147a973bdcc, Sat Aug 26 09:33:50 UTC 2017
hbase(main):001:0> create 'PROVENANCE', 'event'
0 row(s) in 2.9900 seconds
=> Hbase::Table - PROVENANCE

scan 'PROVENANCE'

 ff91e204-05b0-48aa-a666-7942e3f109ab                     column=event:previousAttributes, timestamp=1517159115042, value={"path":"./","filename":"humidity.583225-583284.log","s2s.address":"192.168.1.197:55032","s2s.host":"1
                                                          92.168.1.197","mime.type":"text/plain","uuid":"9006a1bb-d755-4272-b8d3-76e666c2a7c6","tailfile.original.path":"/opt/demo/logs/humidity.log"}
 ff91e204-05b0-48aa-a666-7942e3f109ab                     column=event:previousContentURI, timestamp=1517159115042, value=http://192.168.1.193:8080/nifi-api/provenance-events/61825/content/input
 ff91e204-05b0-48aa-a666-7942e3f109ab                     column=event:previousEntitySize, timestamp=1517159115042, value=59
 ff91e204-05b0-48aa-a666-7942e3f109ab                     column=event:processGroupId, timestamp=1517159115042, value=01611005-4e82-1491-ae5d-ca64f59491cb
 ff91e204-05b0-48aa-a666-7942e3f109ab                     column=event:processGroupName, timestamp=1517159115042, value=Process MiniFi Creator
 ff91e204-05b0-48aa-a666-7942e3f109ab                     column=event:timestamp, timestamp=1517159115042, value=2018-01-28T00:25:30.616Z
 ff91e204-05b0-48aa-a666-7942e3f109ab                     column=event:timestampMillis, timestamp=1517159115042, value=1517099130616
 ff91e204-05b0-48aa-a666-7942e3f109ab                     column=event:updatedAttributes, timestamp=1517159115042, value={"RouteOnAttribute.Route":"humidity"}
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:actorHostname, timestamp=1517159114898, value=192.168.1.193
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:application, timestamp=1517159114898, value=NiFi Flow
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:childIds, timestamp=1517159114898, value=[]
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:componentId, timestamp=1517159114898, value=3a25cda9-0161-1000-813c-631724a10585
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:componentName, timestamp=1517159114898, value=RouteOnAttribute
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:componentType, timestamp=1517159114898, value=RouteOnAttribute
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:contentURI, timestamp=1517159114898, value=http://192.168.1.193:8080/nifi-api/provenance-events/61701/content/output
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:durationMillis, timestamp=1517159114898, value=-1
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:entityId, timestamp=1517159114898, value=9b017666-7ce9-45c5-9d0a-2f81e56d6fa8
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:entitySize, timestamp=1517159114898, value=16
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:entityType, timestamp=1517159114898, value=org.apache.nifi.flowfile.FlowFile
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:eventOrdinal, timestamp=1517159114898, value=61701
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:eventType, timestamp=1517159114898, value=ROUTE
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:lineageStart, timestamp=1517159114898, value=1517084974341
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:parentIds, timestamp=1517159114898, value=[]
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:platform, timestamp=1517159114898, value=nifi
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:previousAttributes, timestamp=1517159114898, value={"path":"./","filename":"uv.164064-164080.log","s2s.address":"192.168.1.197:55032","s2s.host":"192.168
                                                          .1.197","mime.type":"text/plain","uuid":"9b017666-7ce9-45c5-9d0a-2f81e56d6fa8","tailfile.original.path":"/opt/demo/logs/uv.log"}
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:previousContentURI, timestamp=1517159114898, value=http://192.168.1.193:8080/nifi-api/provenance-events/61701/content/input
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:previousEntitySize, timestamp=1517159114898, value=16
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:processGroupId, timestamp=1517159114898, value=01611005-4e82-1491-ae5d-ca64f59491cb
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:processGroupName, timestamp=1517159114898, value=Process MiniFi Creator
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:timestamp, timestamp=1517159114898, value=2018-01-28T00:25:30.607Z
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:timestampMillis, timestamp=1517159114898, value=1517099130607
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:updatedAttributes, timestamp=1517159114898, value={"RouteOnAttribute.Route":"uv"}
1830 row(s) in 11.7680 seconds


provenancereporting.xml

Learning to Use HBase

https://hortonworks.com/hadoop-tutorial/introduction-apache-hbase-concepts-apache-phoenix-new-backup...

2,895 Views