Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Guru

1. Introduction

NiFi is a powerful and easy to use technology to build dataflows from diverse sources to diverse targets while transforming and dynamically routing in between. NiFi is packaged in HDF 2.0 which (in addition to bundling Kafka and Storm for a complete data movement platform) pushes NiFi to enterprise readiness with Ambari management and Ranger security and multitenancy.

One of the hallmarks of NiFi is its awesome drag-and-drop UI which makes building and configuring dataflows drop dead easy. However, when using the same parts of a flow repeatedly across projects within a team or across the organization ... the UI can slow development down by forcing the same manual steps to make the same pieces from scratch.

You can overcome this problem by using two features of NiFi -- templates and configurations using Expression Language references -- to build a library of reusable components that can be used as pre-built components to new flows. Doing so provides the following advantages to the team and the enterprise:

  • rapid development through component reuse
  • adoption of standards and patterns through component reuse
  • code that can be change-managed and implemented in an SDLC (Software Development Lifecycle) similar to any other software code, including promotion of the same code base across dev, test and production environments.

Note: for enterprise NiFi security, see these two valuable posts:

https://community.hortonworks.com/articles/60842/hdf-20-defining-nifi-policies-in-ranger.html https://community.hortonworks.com/articles/60001/hdf-20-integrating-secured-nifi-with-secured-range....

2. Overview of NiFi Reusable Component Technology

Reusable NiFi components center around a template and its configurations as shown below.

8408-reuseoverview.png

Template

  • templates are created from components made in the UI and saved to the NiFi environment (through the UI); they are shared between NiFi clusters by downloaded to a local machine, shared and uploaded to a new NiFi environment (UI or Restful API)
  • templates are XML
  • templates can be made from any subset of a flow: a single processor, a flattened subflow, or a process group holding a subflow
  • alternatively, templates can be made from full flows
  • templates can be uploaded to the UI and used as a starting point for a flow (all configs will be retained from the downloaded template and can now be changed or retained)
  • resulting flows from templates can themselves be downloaded as templates and deployed across environments (or implemented as reusable components in new flows)

Configurations

Configuration properties can be Expression Language (EL) references to

  • system properties
  • OS environment variables,
  • custom properties in written in a file

But note that:

  • EL references can only be used in Processor Properties
  • EL references can only be used in Processor Properties where "Supports expression language: true" (by clicking on question mark)
  • EL references have the form ${property.name}, e.g ${MYSQL_PWD} to reference a password SET in NiFi server operating system, or ${hdfs.zone.landing} to reference an HDFS path written in a custom property file.
  • EL references can concatenated, e.g. /data/${hdfs.zone.landing}

Configuration: OS Environment Variables

Export environment variable (best for sensitive values), e.g. export MYSQL_PWD=secretpwd

Configuration: Custom Property Files

To implement a file with custom property name=value pairs to be referenced by EL, do the following:

  1. On each NiFi server in your cluster, open the nifi.properties file
  2. For the field nifi.variable.registry.properties, set it to the path to your custom property file (or a csv of a list of custom property files)

Precedence for EL references

Properties referenced through the EL should have unique names. Properties that have the same names are given the following precedent (i.e. the value of the one found first in the below sequence is used):

Processor attribute -> Flow File attribute -> customer property file attribute -> system property -> OS environment variable

Also see:

See the following for more on using EL to reference properties.

https://community.hortonworks.com/articles/57304/supporting-custom-properties-for-expression-langua.... https://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.0/bk_user-guide/content/Using_Custom_Properti...

3. Building a Reusable Component

Let's say that I want to build the following component to be reused for many different flows.

8405-screen-shot-2016-10-10-at-85004-pm.png

My intent is to tail a file (GetFile) and do a simple transformation (ReplaceText) and output this to any processor in a flow I build in the future. I also want to push the original file content to HDFS for historical analysis. Putting to HDFS is not so simple: I first should MergeContent so HDFS files do not each hold a single line of code and I should UpdateAttribute so the filename in HDFS is the original name from GetFile with timestamp appended to distinguish files.

Many configuration attributes can be changed for each time this reusable component is used in a new flow, e.g. paths, transformation regex, etc.

Step 1: Build in UI (including configurations)

Build the reusable component in your UI. Configure everything with either empty, hard-coded or EL values. (Best practice around this will be discussed later below).

Note that the template has Processor errors. In the case of GetFile, the input directory is not configured. In the case of ReplaceText the Success relationship is not connected or auto-terminated. That is OK .. whoever uses this component will configure these later according to his or her specific flow.

Step 2: Save and Download Template

  1. Select processors and connections that will be your reusable component (select all or select each processor and connection separately).
  2. Save template. After you name it you can go to templates and see it listed as a member of your logged in user's list of templates

    8410-screen-shot-2016-10-10-at-101158-pm.png8411-screen-shot-2016-10-10-at-101404-pm.png 8412-screen-shot-2016-10-10-at-101914-pm.png

  3. To share template with others, download from NiFi Templates list (icon to left of trash can). To delete from the list, click the trashcan

4. Using Reusable Component

Using a reusable component is the opposite of the above.

  1. If the component is not on your list NiFi Templates, get it to your local file system and upload it

    8418-screen-shot-2016-10-10-at-102904-pm.png

  2. Grab the template icon, pull to your canvas and choose the template you want to add to the canvas

    8417-screen-shot-2016-10-10-at-103202-pm.png8419-screen-shot-2016-10-10-at-103531-pm.png

    Clicking Add will add it to your canvas in the same way as adding a Processor as you typically do. It will be added in the same state as it was downloaded by the person who built it (unless the templates XML was changed manually after the original download

  3. Change any configurations you need to change, and connect to the rest of the flow you want to build.

5. How does this work?

You can reuse (instantiate) a single component as many times as you want to in a single flow (including inside and outside of process groups). How does this work? How does NiFi instantiate each separately? When you drag a processor, connection, or processor group onto the canvas each is given a UUID like 9fc758e3-0157-1000-e89d-a6033019f0cf. The first part of this is a global id and the second part is a instance id. When you download it as a template, the global id is retained but the instance id is set to 0s, e.g. 9fc758e3-0157-1000-0000-000000000000. When you upload a template and drag it to the canvas, the instance id is converted from 0s to a new unique instance id e.g 9fc758e3-0157-1000-e17b-1bc0cb0c1921. Simple but powerful.

6. Software Development Lifecycle (SDLC)

Templates and custom configuration files can be considered as code and thus easily integrated into a typical SDLC.

8420-sdlc.png

Summary:

  1. Reusable components are added as templates to a central repository. This should be governed. Reusable components are probably best represented as process groups. This makes building new flows simpler by separating the reuse components (and encapsulation of details) from the new flow components.
  2. Development groups pull reusable components and upload to their NiFi environment to build new flows.

    In flow configurations, sensitive values should be configured as Expression Language references to OS environment variables that you set in each environment, e.g. ${MYSQL_PSSWRD}.

    Other environment-specific config values should similarly use Expression Language references. If these are not sensitive, should be in custom properties file.

  3. Developers finalize their flows and submit the template of the flow to version control, eg Git (and also submit custom property files).
  4. Template and custom property files are promoted to each environment just as source code typically is.
  5. Automation: deploying templates to environments can be done via NiFi RestAPI integrated with other automation tools.
  6. Governance bodies decide which configurations can be changed in real-time (e.g. ControlRate properties). These changes do not need to go through verision control and can be made by authorized admins on the fly. For authorization policies, see: https://community.hortonworks.com/articles/60842/hdf-20-defining-nifi-policies-in-ranger.html

Shoutouts and links

Many thanks to NiFi SMEs for validation of technical aspects of reuse ideas, particularly @Andy LoPresto, @Andrew Grande, @Andrew Psaltis, @Koji Kamimura, @Matt Burgess.

Useful links:

https://nifi.apache.org/docs/nifi-docs/ https://nifi.apache.org/docs/nifi-docs/html/getting-started.html https://community.hortonworks.com/content/kbentry/16461/nifi-understanding-how-to-use-process-groups... https://community.hortonworks.com/articles/57304/supporting-custom-properties-for-expression-langua.... https://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.0/bk_user-guide/content/Using_Custom_Properti...


overview.pngscreen-shot-2016-10-10-at-94058-pm.pngscreen-shot-2016-10-10-at-102904-pm.pngscreen-shot-2016-10-10-at-102639-pm.pngscreen-shot-2016-10-10-at-84125-pm.pngblank-diagram-page-1.png
16,987 Views
Comments

Thank you for great article. Can you discuss little a bit what would be recommended approach in case of updating a reusable component? Do we need to update every instance of reusable component?

Thank you!

@Darko Milovanovic

You can update it once (in the version control) but unfortunately it has to be re-deployed to each separate instance in your flows. This is because each component is instantiated separately with a different global id as described in section 5.

Do note that in HDF 3.0 after you do this NiFi keeps versions of each processor deployed, so you can use one version of a processor in one flow, and another version in a different flow (all versions available to choose from).

There is active work on making reusable components shared (instantiated) but that has not been released.

avatar
New Contributor

Thanks for the wonderful article! Have one question. When we export a flow/component as template, all the sensitive values in the processors gets cleared out even when they are set as EL and not actual values. This makes it hard for to auto deploy Nifi pipelines pulling the templates from a Git repo and deploying on to Nifi without manual intervention. Is there any suggestions for a Nifi SDLC that will handle processors with sensitive properties. So far I have been to get it almost working, by using external custom properties and the Nifi REST API for deploying and instantiating templates. But the sensitive values getting clearing out in the template would manual population of the sensitive properties.