Created on 10-11-2016 02:06 PM - edited 08-17-2019 08:54 AM
NiFi is a powerful and easy to use technology to build dataflows from diverse sources to diverse targets while transforming and dynamically routing in between. NiFi is packaged in HDF 2.0 which (in addition to bundling Kafka and Storm for a complete data movement platform) pushes NiFi to enterprise readiness with Ambari management and Ranger security and multitenancy.
One of the hallmarks of NiFi is its awesome drag-and-drop UI which makes building and configuring dataflows drop dead easy. However, when using the same parts of a flow repeatedly across projects within a team or across the organization ... the UI can slow development down by forcing the same manual steps to make the same pieces from scratch.
You can overcome this problem by using two features of NiFi -- templates and configurations using Expression Language references -- to build a library of reusable components that can be used as pre-built components to new flows. Doing so provides the following advantages to the team and the enterprise:
Note: for enterprise NiFi security, see these two valuable posts:
https://community.hortonworks.com/articles/60842/hdf-20-defining-nifi-policies-in-ranger.html https://community.hortonworks.com/articles/60001/hdf-20-integrating-secured-nifi-with-secured-range....
Reusable NiFi components center around a template and its configurations as shown below.
Configuration properties can be Expression Language (EL) references to
But note that:
Export environment variable (best for sensitive values), e.g. export MYSQL_PWD=secretpwd
To implement a file with custom property name=value pairs to be referenced by EL, do the following:
Properties referenced through the EL should have unique names. Properties that have the same names are given the following precedent (i.e. the value of the one found first in the below sequence is used):
Processor attribute -> Flow File attribute -> customer property file attribute -> system property -> OS environment variable
Also see:
See the following for more on using EL to reference properties.
https://community.hortonworks.com/articles/57304/supporting-custom-properties-for-expression-langua.... https://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.0/bk_user-guide/content/Using_Custom_Properti...
Let's say that I want to build the following component to be reused for many different flows.
My intent is to tail a file (GetFile) and do a simple transformation (ReplaceText) and output this to any processor in a flow I build in the future. I also want to push the original file content to HDFS for historical analysis. Putting to HDFS is not so simple: I first should MergeContent so HDFS files do not each hold a single line of code and I should UpdateAttribute so the filename in HDFS is the original name from GetFile with timestamp appended to distinguish files.
Many configuration attributes can be changed for each time this reusable component is used in a new flow, e.g. paths, transformation regex, etc.
Build the reusable component in your UI. Configure everything with either empty, hard-coded or EL values. (Best practice around this will be discussed later below).
Note that the template has Processor errors. In the case of GetFile, the input directory is not configured. In the case of ReplaceText the Success relationship is not connected or auto-terminated. That is OK .. whoever uses this component will configure these later according to his or her specific flow.
Using a reusable component is the opposite of the above.
Clicking Add will add it to your canvas in the same way as adding a Processor as you typically do. It will be added in the same state as it was downloaded by the person who built it (unless the templates XML was changed manually after the original download
You can reuse (instantiate) a single component as many times as you want to in a single flow (including inside and outside of process groups). How does this work? How does NiFi instantiate each separately? When you drag a processor, connection, or processor group onto the canvas each is given a UUID like 9fc758e3-0157-1000-e89d-a6033019f0cf. The first part of this is a global id and the second part is a instance id. When you download it as a template, the global id is retained but the instance id is set to 0s, e.g. 9fc758e3-0157-1000-0000-000000000000. When you upload a template and drag it to the canvas, the instance id is converted from 0s to a new unique instance id e.g 9fc758e3-0157-1000-e17b-1bc0cb0c1921. Simple but powerful.
Templates and custom configuration files can be considered as code and thus easily integrated into a typical SDLC.
In flow configurations, sensitive values should be configured as Expression Language references to OS environment variables that you set in each environment, e.g. ${MYSQL_PSSWRD}.
Other environment-specific config values should similarly use Expression Language references. If these are not sensitive, should be in custom properties file.
Many thanks to NiFi SMEs for validation of technical aspects of reuse ideas, particularly @Andy LoPresto, @Andrew Grande, @Andrew Psaltis, @Koji Kamimura, @Matt Burgess.
Useful links:
https://nifi.apache.org/docs/nifi-docs/ https://nifi.apache.org/docs/nifi-docs/html/getting-started.html https://community.hortonworks.com/content/kbentry/16461/nifi-understanding-how-to-use-process-groups... https://community.hortonworks.com/articles/57304/supporting-custom-properties-for-expression-langua.... https://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.0/bk_user-guide/content/Using_Custom_Properti...
Created on 07-26-2017 03:33 PM
Thank you for great article. Can you discuss little a bit what would be recommended approach in case of updating a reusable component? Do we need to update every instance of reusable component?
Thank you!
Created on 07-28-2017 08:42 PM
You can update it once (in the version control) but unfortunately it has to be re-deployed to each separate instance in your flows. This is because each component is instantiated separately with a different global id as described in section 5.
Do note that in HDF 3.0 after you do this NiFi keeps versions of each processor deployed, so you can use one version of a processor in one flow, and another version in a different flow (all versions available to choose from).
There is active work on making reusable components shared (instantiated) but that has not been released.
Created on 10-10-2017 05:45 PM
Thanks for the wonderful article! Have one question. When we export a flow/component as template, all the sensitive values in the processors gets cleared out even when they are set as EL and not actual values. This makes it hard for to auto deploy Nifi pipelines pulling the templates from a Git repo and deploying on to Nifi without manual intervention. Is there any suggestions for a Nifi SDLC that will handle processors with sensitive properties. So far I have been to get it almost working, by using external custom properties and the Nifi REST API for deploying and instantiating templates. But the sensitive values getting clearing out in the template would manual population of the sensitive properties.