- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 10-04-2017 01:52 PM - edited 08-17-2019 10:43 AM
Objective
This tutorial walks you through a NiFi flow that:
- Uses the LookupRecord processor to parse NiFi provenance events in JSON format and add geolocation data
- Uses the PartitionRecord processor to group like records by State
- Publishes records originating from California to Kafka
This article is the first of a two part series. We will setup the demo environment including flows, controller services and reporting tasks. The second article will walk through the main flow step by step.
Environment
This tutorial was tested using the following environment and components:
- Mac OS X 10.11.6
- Apache NiFi 1.3.0
- Apache Kafka 0.10.2.1
Environment Configuration
Kafka Setup
In the
bin
directory of your Kafka install:
Start ZooKeeper:
./zookeeper-server-start.sh ../config/zookeeper.properties
Start Kafka:
./kafka-server-start.sh ../config/server.properties
Create Kafka Topic:
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic California
Start Kafka Consumer:
./kafka-console-consumer.sh --zookeeper localhost:2181 --topic California --from-beginning
NiFi Instances Configuration
For this tutorial, you need two NiFi instances running. One instance generates and sends provenance data to the other via the SiteToSiteProvenanceReportingTask.
Instructions on how to setup both instances can be found in the HCC article "Extracting NiFi Provenance Data using SiteToSiteProvenanceReportingTask".
Main Dataflow Instance Setup
In the instance that will use the provenance data (http://localhost:8088/nifi), import the following template:
You should see the following flow on your NiFi canvas:
First, let's get the MaxMind Database file that is used to enrich the data. This is done by the flow contained within the "Gather Enrichment Data" process group.
Run the flow and the file GeoLite2-City.mmdb should be downloaded locally into a directory named "enrichment-db" within your NiFi installation.
Now, let's enable the flow's controller services. Select the gear icon from the Operate Palette:
This opens the NiFi Flow Configuration window. Select the Controller Services tab:
Enable AvroSchemaRegistry by selecting the lightning bolt icon/button. This will then allow you to enable the JsonTreeReader and JSONRecordSetWriter controller services. Select the lightning bolt icons for both of these services as well as the IPLookupService controller service. All the controller services should be enabled at this point:
We will step through the main flow in detail in the second article. For now, start only the "Provenance In" input port.
Provenance Event Generating Instance
In the instance that generates provenance data (http://localhost:8080/nifi), import the following template:
The following flow should be on the NiFi canvas:
The two GetHTTP processors are configured as follows:
The UpdateAttribute processor is configured with all default settings:
Now, let's create the SiteToSiteProvenance reporting task.
Select the Global menu and choose "Controller Settings":
Select the Reporting Tasks tab and click the "+" icon:
Select SiteToSiteProvenanceReportingTask and click the "Add" button:
Configure the reporting task as follows:
On the Settings tab, set "Run Schedule" to 5 seconds:
(Note: Some of these settings are for demo purposes only and may need to be adjusted if run in a production environment.)
Start the reporting task:
Return to the NiFi canvas and start the flow to generate provenance data:
Run the flow for 30 secs or so to generate sufficient provenance events. Stop the flow.
Switch to your other NiFi instance. You should see flowfiles queued after the "Provenance In" input port:
We are now ready to geo enrich the provenance data.
Continue to the second article for a detailed walk through of the LookupRecord flow.