Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Guru

Objective

This tutorial walks you through a NiFi flow that:

  • Uses the LookupRecord processor to parse NiFi provenance events in JSON format and add geolocation data
  • Uses the PartitionRecord processor to group like records by State
  • Publishes records originating from California to Kafka

This article is the first of a two part series. We will setup the demo environment including flows, controller services and reporting tasks. The second article will walk through the main flow step by step.

Environment

This tutorial was tested using the following environment and components:

  • Mac OS X 10.11.6
  • Apache NiFi 1.3.0
  • Apache Kafka 0.10.2.1

Environment Configuration

Kafka Setup

In the bin directory of your Kafka install:

Start ZooKeeper: ./zookeeper-server-start.sh ../config/zookeeper.properties

Start Kafka: ./kafka-server-start.sh ../config/server.properties

Create Kafka Topic: ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic California

Start Kafka Consumer: ./kafka-console-consumer.sh --zookeeper localhost:2181 --topic California --from-beginning

NiFi Instances Configuration

For this tutorial, you need two NiFi instances running. One instance generates and sends provenance data to the other via the SiteToSiteProvenanceReportingTask.

Instructions on how to setup both instances can be found in the HCC article "Extracting NiFi Provenance Data using SiteToSiteProvenanceReportingTask".

Main Dataflow Instance Setup

In the instance that will use the provenance data (http://localhost:8088/nifi), import the following template:

lookuprecord-geoenrich.xml

You should see the following flow on your NiFi canvas:

40645-1-lookuprecord-geoenrich-canvas.png

First, let's get the MaxMind Database file that is used to enrich the data. This is done by the flow contained within the "Gather Enrichment Data" process group.

40646-2-gatherenrichmentdata-flow.png

Run the flow and the file GeoLite2-City.mmdb should be downloaded locally into a directory named "enrichment-db" within your NiFi installation.

Now, let's enable the flow's controller services. Select the gear icon from the Operate Palette:

40647-3-operatepalette-configuration.png

This opens the NiFi Flow Configuration window. Select the Controller Services tab:

40648-4-controllerservices-disabled.png

Enable AvroSchemaRegistry by selecting the lightning bolt icon/button. This will then allow you to enable the JsonTreeReader and JSONRecordSetWriter controller services. Select the lightning bolt icons for both of these services as well as the IPLookupService controller service. All the controller services should be enabled at this point:

40649-5-controllerservices-enabled.png

We will step through the main flow in detail in the second article. For now, start only the "Provenance In" input port.

40650-6-provenancein-started.png

Provenance Event Generating Instance

In the instance that generates provenance data (http://localhost:8080/nifi), import the following template:

fetchsites.xml

The following flow should be on the NiFi canvas:

40652-7-proveventflow.png

The two GetHTTP processors are configured as follows:

40653-8a-gethttp-properties.png

40654-8b-gethttp-properties.png

The UpdateAttribute processor is configured with all default settings:

40655-8c-updateattribute-properties.png

Now, let's create the SiteToSiteProvenance reporting task.

Select the Global menu and choose "Controller Settings":

40658-9a-global-controllersettings.png

Select the Reporting Tasks tab and click the "+" icon:

40659-9b-reportingtask-add.png

Select SiteToSiteProvenanceReportingTask and click the "Add" button:

40660-10-s2sprovrpttask-add.png

Configure the reporting task as follows:

40661-11-s2sprovrpttask-properties.png

On the Settings tab, set "Run Schedule" to 5 seconds:

40662-12-s2sprovrpttask-settings.png

(Note: Some of these settings are for demo purposes only and may need to be adjusted if run in a production environment.)

Start the reporting task:

40663-13-s2sprovrpttask-start.png

Return to the NiFi canvas and start the flow to generate provenance data:

40664-14-proveventflow-start.png

Run the flow for 30 secs or so to generate sufficient provenance events. Stop the flow.

Switch to your other NiFi instance. You should see flowfiles queued after the "Provenance In" input port:

40665-15-lookuprecordflow-queue.png

We are now ready to geo enrich the provenance data.

Continue to the second article for a detailed walk through of the LookupRecord flow.

922 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 10:43 AM
Updated by:
 
Contributors
Top Kudoed Authors