Member since
06-20-2016
488
Posts
433
Kudos Received
118
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3605 | 08-25-2017 03:09 PM | |
| 2515 | 08-22-2017 06:52 PM | |
| 4197 | 08-09-2017 01:10 PM | |
| 8977 | 08-04-2017 02:34 PM | |
| 8949 | 08-01-2017 11:35 AM |
10-21-2016
01:43 PM
1 Kudo
If your values have any text characters in them, Hive will return null if you try to cast them to decimal. For example, if there is a trailing whitespace character it will look like a decimal but hive will see the text character (whitespace). It looks like your values have commas after them. If so, Hive will convert to null for reasons explained above.
... View more
10-17-2016
07:39 PM
@Jasper See https://community.hortonworks.com/articles/124/atlas-api-tips-create-trait-type-example.html
Hope it helps.
... View more
10-14-2016
04:05 PM
4 Kudos
@Frank Maritato I have recently published an article on this topic: https://community.hortonworks.com/content/kbentry/60868/enterprise-nifi-implementing-reusable-components-a.html This is meant to realize the SDLC in the style of conventional code change-management and deployment, and is best seen as an alternative to the post of @Andy LoPresto. It is meant to tie into an enterprise's existing process for code in general. Main suggestions are:
In configs of flows, sensitive values should be configured as Expression Language references to OS environment variables that you set in each envt. Ex. ${MYSQL_PSSWRD} Other environment specific config values should similarly use Expression Language refs. If these are not sensitive, should be in custom properties file. Developers finalize their flows and submit the template of the flow to git (and the custom property file). Template and custom property file is promoted to each environment just as source code typically is. Automation: uploading finalized (git) template to environments can be done via NiFi RestAPI
... View more
10-14-2016
03:47 PM
1 Kudo
@Roberto Sancho I have it working. I have edited the original answer with the working code Note the following I changed \\361 to \\0361 simplified by removing () that wrap []
... View more
10-14-2016
02:03 PM
@Roberto Sancho Could you share a few lines of the file you are using (including n tilde)? Tx
... View more
10-14-2016
01:36 PM
3 Kudos
In the ascii table n tilde is represented as octal 0361 which is represented in regex simply as \0361. So simply include [^\\0361] in your expression to prevent n tilde from being replaced by ' '. This should work for you REPLACE($0,'[^a-zA-Z\\0361\\n\\.\\-]+','') See: http://web.cs.mun.ca/~michael/c/ascii-table.html http://www.regular-expressions.info/refcharacters.html @Roberto Sancho -- I have corrected the code in this answer. The above works for me. If this is what you are looking for, please accept the answer, else please let me know remaining gaps.
... View more
10-11-2016
03:15 PM
Hi @Simran Kaur. Edge/client nodes are only for user access to the cluster. Having said that, they are not mandatory for a hadoop cluster since users can access through other means (e.g. Ambari views, Zeppelin, WebHDFS, HDFS mounts and other). So edge/client nodes are a bit distracting. The main architecture to Hadoop is the master-slave architecture of services. At the highest level, services typically have a master that manages a job and slaves that do the work distributed on the cluster. These are never on an edge node (edge node let's the user communicate to the master service).
... View more
10-11-2016
02:48 PM
That stack overflow link is not a good reference (oversimplified and incorrect). If you install through the Ambari management console you will have services assigned to masters and slaves automatically. See: https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.1.0/bk_ambari-installation/content/ch_Deploy_and_Configure_a_HDP_Cluster.html If you are interested in what a large cluster with many services looks like or if you want to do it manually (not preferrable), refer to: https://community.hortonworks.com/articles/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto.html
... View more
10-11-2016
02:40 PM
This document states the order that services should be started https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_reference/content/starting_hdp_services.html
... View more
10-11-2016
02:06 PM
18 Kudos
1. Introduction
NiFi is a powerful and easy to use technology to build dataflows from diverse sources to diverse targets while transforming and dynamically routing in between. NiFi is packaged in HDF 2.0 which (in addition to bundling Kafka and Storm for a complete data movement platform) pushes NiFi to enterprise readiness with Ambari management and Ranger security and multitenancy.
One of the hallmarks of NiFi is its awesome drag-and-drop UI which makes building and configuring dataflows drop dead easy. However, when using the same parts of a flow repeatedly across projects within a team or across the organization ... the UI can slow development down by forcing the same manual steps to make the same pieces from scratch.
You can overcome this problem by using two features of NiFi -- templates and configurations using Expression Language references -- to build a library of reusable components that can be used as pre-built components to new flows. Doing so provides the following advantages to the team and the enterprise:
rapid development through component reuse
adoption of standards and patterns through component reuse
code that can be change-managed and implemented in an SDLC (Software Development Lifecycle) similar to any other software code, including promotion of the same code base across dev, test and production environments.
Note: for enterprise NiFi security, see these two valuable posts:
https://community.hortonworks.com/articles/60842/hdf-20-defining-nifi-policies-in-ranger.html
https://community.hortonworks.com/articles/60001/hdf-20-integrating-secured-nifi-with-secured-range.html 2. Overview of NiFi Reusable Component Technology
Reusable NiFi components center around a template and its configurations as shown below.
Template
templates are created from components made in the UI and saved to the NiFi environment (through the UI); they are shared between NiFi clusters by downloaded to a local machine, shared and uploaded to a new NiFi environment (UI or Restful API)
templates are XML
templates can be made from any subset of a flow: a single processor, a flattened subflow, or a process group holding a subflow
alternatively, templates can be made from full flows
templates can be uploaded to the UI and used as a starting point for a flow (all configs will be retained from the downloaded template and can now be changed or retained)
resulting flows from templates can themselves be downloaded as templates and deployed across environments (or implemented as reusable components in new flows)
Configurations
Configuration properties can be Expression Language (EL) references to
system properties
OS environment variables,
custom properties in written in a file
But note that:
EL references can only be used in Processor Properties
EL references can only be used in Processor Properties where "Supports expression language: true" (by clicking on question mark)
EL references have the form ${property.name}, e.g ${MYSQL_PWD} to reference a password SET in NiFi server operating system, or ${hdfs.zone.landing} to reference an HDFS path written in a custom property file.
EL references can concatenated, e.g. /data/${hdfs.zone.landing}
Configuration: OS Environment Variables Export environment variable (best for sensitive values), e.g. export MYSQL_PWD=secretpwd
Configuration: Custom Property Files
To implement a file with custom property name=value pairs to be referenced by EL, do the following:
On each NiFi server in your cluster, open the nifi.properties file
For the field nifi.variable.registry.properties, set it to the path to your custom property file (or a csv of a list of custom property files)
Precedence for EL references
Properties referenced through the EL should have unique names. Properties that have the same names are given the following precedent (i.e. the value of the one found first in the below sequence is used):
Processor attribute -> Flow File attribute -> customer property file attribute -> system property -> OS environment variable
Also see:
See the following for more on using EL to reference properties.
https://community.hortonworks.com/articles/57304/supporting-custom-properties-for-expression-langua.html
https://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.0/bk_user-guide/content/Using_Custom_Properties.html 3. Building a Reusable Component
Let's say that I want to build the following component to be reused for many different flows.
My intent is to tail a file (GetFile) and do a simple transformation (ReplaceText) and output this to any processor in a flow I build in the future. I also want to push the original file content to HDFS for historical analysis. Putting to HDFS is not so simple: I first should MergeContent so HDFS files do not each hold a single line of code and I should UpdateAttribute so the filename in HDFS is the original name from GetFile with timestamp appended to distinguish files.
Many configuration attributes can be changed for each time this reusable component is used in a new flow, e.g. paths, transformation regex, etc.
Step 1: Build in UI (including configurations)
Build the reusable component in your UI. Configure everything with either empty, hard-coded or EL values. (Best practice around this will be discussed later below).
Note that the template has Processor errors. In the case of GetFile, the input directory is not configured. In the case of ReplaceText the Success relationship is not connected or auto-terminated. That is OK .. whoever uses this component will configure these later according to his or her specific flow.
Step 2: Save and Download Template
Select processors and connections that will be your reusable component (select all or select each processor and connection separately).
Save template. After you name it you can go to templates and see it listed as a member of your logged in user's list of templates
To share template with others, download from NiFi Templates list (icon to left of trash can). To delete from the list, click the trashcan
4. Using Reusable Component
Using a reusable component is the opposite of the above.
If the component is not on your list NiFi Templates, get it to your local file system and upload it
Grab the template icon, pull to your canvas and choose the template you want to add to the canvas
Clicking Add will add it to your canvas in the same way as adding a Processor as you typically do. It will be added in the same state as it was downloaded by the person who built it (unless the templates XML was changed manually after the original download
Change any configurations you need to change, and connect to the rest of the flow you want to build. 5. How does this work?
You can reuse (instantiate) a single component as many times as you want to in a single flow (including inside and outside of process groups). How does this work? How does NiFi instantiate each separately? When you drag a processor, connection, or processor group onto the canvas each is given a UUID like 9fc758e3-0157-1000-e89d-a6033019f0cf. The first part of this is a global id and the second part is a instance id. When you download it as a template, the global id is retained but the instance id is set to 0s, e.g. 9fc758e3-0157-1000-0000-000000000000. When you upload a template and drag it to the canvas, the instance id is converted from 0s to a new unique instance id e.g 9fc758e3-0157-1000-e17b-1bc0cb0c1921. Simple but powerful. 6. Software Development Lifecycle (SDLC)
Templates and custom configuration files can be considered as code and thus easily integrated into a typical SDLC.
Summary:
Reusable components are added as templates to a central repository. This should be governed. Reusable components are probably best represented as process groups. This makes building new flows simpler by separating the reuse components (and encapsulation of details) from the new flow components.
Development groups pull reusable components and upload to their NiFi environment to build new flows.
In flow configurations, sensitive values should be configured as Expression Language references to OS environment variables that you set in each environment, e.g. ${MYSQL_PSSWRD}.
Other environment-specific config values should similarly use Expression Language references. If these are not sensitive, should be in custom properties file. Developers finalize their flows and submit the template of the flow to version control, eg Git (and also submit custom property files).
Template and custom property files are promoted to each environment just as source code typically is.
Automation: deploying templates to environments can be done via NiFi RestAPI integrated with other automation tools.
Governance bodies decide which configurations can be changed in real-time (e.g. ControlRate properties). These changes do not need to go through verision control and can be made by authorized admins on the fly. For authorization policies, see: https://community.hortonworks.com/articles/60842/hdf-20-defining-nifi-policies-in-ranger.html Shoutouts and links
Many thanks to NiFi SMEs for validation of technical aspects of reuse ideas, particularly @Andy LoPresto, @Andrew Grande, @Andrew Psaltis, @Koji Kamimura, @Matt Burgess.
Useful links:
https://nifi.apache.org/docs/nifi-docs/
https://nifi.apache.org/docs/nifi-docs/html/getting-started.html
https://community.hortonworks.com/content/kbentry/16461/nifi-understanding-how-to-use-process-groups-and-r.html
https://community.hortonworks.com/articles/57304/supporting-custom-properties-for-expression-langua.html
https://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.0/bk_user-guide/content/Using_Custom_Properties.html
... View more
Labels: