About gkeys

gkeys · ‎10-21-2016

If your values have any text characters in them, Hive will return null if you try to cast them to decimal. For example, if there is a trailing whitespace character it will look like a decimal but hive will see the text character (whitespace). It looks like your values have commas after them. If so, Hive will convert to null for reasons explained above.

gkeys · ‎10-17-2016

@Jasper See https://community.hortonworks.com/articles/124/atlas-api-tips-create-trait-type-example.html Hope it helps.

gkeys · ‎10-14-2016

@Frank Maritato I have recently published an article on this topic: https://community.hortonworks.com/content/kbentry/60868/enterprise-nifi-implementing-reusable-components-a.html This is meant to realize the SDLC in the style of conventional code change-management and deployment, and is best seen as an alternative to the post of @Andy LoPresto. It is meant to tie into an enterprise's existing process for code in general. Main suggestions are: In configs of flows, sensitive values should be configured as Expression Language references to OS environment variables that you set in each envt. Ex. ${MYSQL_PSSWRD} Other environment specific config values should similarly use Expression Language refs. If these are not sensitive, should be in custom properties file. Developers finalize their flows and submit the template of the flow to git (and the custom property file). Template and custom property file is promoted to each environment just as source code typically is. Automation: uploading finalized (git) template to environments can be done via NiFi RestAPI

gkeys · ‎10-14-2016

@Roberto Sancho I have it working. I have edited the original answer with the working code Note the following I changed \\361 to \\0361 simplified by removing () that wrap []

gkeys · ‎10-14-2016

@Roberto Sancho Could you share a few lines of the file you are using (including n tilde)? Tx

gkeys · ‎10-14-2016

In the ascii table n tilde is represented as octal 0361 which is represented in regex simply as \0361. So simply include [^\\0361] in your expression to prevent n tilde from being replaced by ' '. This should work for you REPLACE($0,'[^a-zA-Z\\0361\\n\\.\\-]+','') See: http://web.cs.mun.ca/~michael/c/ascii-table.html http://www.regular-expressions.info/refcharacters.html @Roberto Sancho -- I have corrected the code in this answer. The above works for me. If this is what you are looking for, please accept the answer, else please let me know remaining gaps.

gkeys · ‎10-11-2016

Hi @Simran Kaur. Edge/client nodes are only for user access to the cluster. Having said that, they are not mandatory for a hadoop cluster since users can access through other means (e.g. Ambari views, Zeppelin, WebHDFS, HDFS mounts and other). So edge/client nodes are a bit distracting. The main architecture to Hadoop is the master-slave architecture of services. At the highest level, services typically have a master that manages a job and slaves that do the work distributed on the cluster. These are never on an edge node (edge node let's the user communicate to the master service).

gkeys · ‎10-11-2016

That stack overflow link is not a good reference (oversimplified and incorrect). If you install through the Ambari management console you will have services assigned to masters and slaves automatically. See: https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.1.0/bk_ambari-installation/content/ch_Deploy_and_Configure_a_HDP_Cluster.html If you are interested in what a large cluster with many services looks like or if you want to do it manually (not preferrable), refer to: https://community.hortonworks.com/articles/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto.html

gkeys · ‎10-11-2016

This document states the order that services should be started https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_reference/content/starting_hdp_services.html

gkeys · ‎10-11-2016

1. Introduction NiFi is a powerful and easy to use technology to build dataflows from diverse sources to diverse targets while transforming and dynamically routing in between. NiFi is packaged in HDF 2.0 which (in addition to bundling Kafka and Storm for a complete data movement platform) pushes NiFi to enterprise readiness with Ambari management and Ranger security and multitenancy. One of the hallmarks of NiFi is its awesome drag-and-drop UI which makes building and configuring dataflows drop dead easy. However, when using the same parts of a flow repeatedly across projects within a team or across the organization ... the UI can slow development down by forcing the same manual steps to make the same pieces from scratch. You can overcome this problem by using two features of NiFi -- templates and configurations using Expression Language references -- to build a library of reusable components that can be used as pre-built components to new flows. Doing so provides the following advantages to the team and the enterprise: rapid development through component reuse adoption of standards and patterns through component reuse code that can be change-managed and implemented in an SDLC (Software Development Lifecycle) similar to any other software code, including promotion of the same code base across dev, test and production environments. Note: for enterprise NiFi security, see these two valuable posts: https://community.hortonworks.com/articles/60842/hdf-20-defining-nifi-policies-in-ranger.html https://community.hortonworks.com/articles/60001/hdf-20-integrating-secured-nifi-with-secured-range.html 2. Overview of NiFi Reusable Component Technology Reusable NiFi components center around a template and its configurations as shown below. Template templates are created from components made in the UI and saved to the NiFi environment (through the UI); they are shared between NiFi clusters by downloaded to a local machine, shared and uploaded to a new NiFi environment (UI or Restful API) templates are XML templates can be made from any subset of a flow: a single processor, a flattened subflow, or a process group holding a subflow alternatively, templates can be made from full flows templates can be uploaded to the UI and used as a starting point for a flow (all configs will be retained from the downloaded template and can now be changed or retained) resulting flows from templates can themselves be downloaded as templates and deployed across environments (or implemented as reusable components in new flows) Configurations Configuration properties can be Expression Language (EL) references to system properties OS environment variables, custom properties in written in a file But note that: EL references can only be used in Processor Properties EL references can only be used in Processor Properties where "Supports expression language: true" (by clicking on question mark) EL references have the form ${property.name}, e.g ${MYSQL_PWD} to reference a password SET in NiFi server operating system, or ${hdfs.zone.landing} to reference an HDFS path written in a custom property file. EL references can concatenated, e.g. /data/${hdfs.zone.landing} Configuration: OS Environment Variables Export environment variable (best for sensitive values), e.g. export MYSQL_PWD=secretpwd Configuration: Custom Property Files To implement a file with custom property name=value pairs to be referenced by EL, do the following: On each NiFi server in your cluster, open the nifi.properties file For the field nifi.variable.registry.properties, set it to the path to your custom property file (or a csv of a list of custom property files) Precedence for EL references Properties referenced through the EL should have unique names. Properties that have the same names are given the following precedent (i.e. the value of the one found first in the below sequence is used): Processor attribute -> Flow File attribute -> customer property file attribute -> system property -> OS environment variable Also see: See the following for more on using EL to reference properties. https://community.hortonworks.com/articles/57304/supporting-custom-properties-for-expression-langua.html https://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.0/bk_user-guide/content/Using_Custom_Properties.html 3. Building a Reusable Component Let's say that I want to build the following component to be reused for many different flows. My intent is to tail a file (GetFile) and do a simple transformation (ReplaceText) and output this to any processor in a flow I build in the future. I also want to push the original file content to HDFS for historical analysis. Putting to HDFS is not so simple: I first should MergeContent so HDFS files do not each hold a single line of code and I should UpdateAttribute so the filename in HDFS is the original name from GetFile with timestamp appended to distinguish files. Many configuration attributes can be changed for each time this reusable component is used in a new flow, e.g. paths, transformation regex, etc. Step 1: Build in UI (including configurations) Build the reusable component in your UI. Configure everything with either empty, hard-coded or EL values. (Best practice around this will be discussed later below). Note that the template has Processor errors. In the case of GetFile, the input directory is not configured. In the case of ReplaceText the Success relationship is not connected or auto-terminated. That is OK .. whoever uses this component will configure these later according to his or her specific flow. Step 2: Save and Download Template Select processors and connections that will be your reusable component (select all or select each processor and connection separately). Save template. After you name it you can go to templates and see it listed as a member of your logged in user's list of templates To share template with others, download from NiFi Templates list (icon to left of trash can). To delete from the list, click the trashcan 4. Using Reusable Component Using a reusable component is the opposite of the above. If the component is not on your list NiFi Templates, get it to your local file system and upload it Grab the template icon, pull to your canvas and choose the template you want to add to the canvas Clicking Add will add it to your canvas in the same way as adding a Processor as you typically do. It will be added in the same state as it was downloaded by the person who built it (unless the templates XML was changed manually after the original download Change any configurations you need to change, and connect to the rest of the flow you want to build. 5. How does this work? You can reuse (instantiate) a single component as many times as you want to in a single flow (including inside and outside of process groups). How does this work? How does NiFi instantiate each separately? When you drag a processor, connection, or processor group onto the canvas each is given a UUID like 9fc758e3-0157-1000-e89d-a6033019f0cf. The first part of this is a global id and the second part is a instance id. When you download it as a template, the global id is retained but the instance id is set to 0s, e.g. 9fc758e3-0157-1000-0000-000000000000. When you upload a template and drag it to the canvas, the instance id is converted from 0s to a new unique instance id e.g 9fc758e3-0157-1000-e17b-1bc0cb0c1921. Simple but powerful. 6. Software Development Lifecycle (SDLC) Templates and custom configuration files can be considered as code and thus easily integrated into a typical SDLC. Summary: Reusable components are added as templates to a central repository. This should be governed. Reusable components are probably best represented as process groups. This makes building new flows simpler by separating the reuse components (and encapsulation of details) from the new flow components. Development groups pull reusable components and upload to their NiFi environment to build new flows. In flow configurations, sensitive values should be configured as Expression Language references to OS environment variables that you set in each environment, e.g. ${MYSQL_PSSWRD}. Other environment-specific config values should similarly use Expression Language references. If these are not sensitive, should be in custom properties file. Developers finalize their flows and submit the template of the flow to version control, eg Git (and also submit custom property files). Template and custom property files are promoted to each environment just as source code typically is. Automation: deploying templates to environments can be done via NiFi RestAPI integrated with other automation tools. Governance bodies decide which configurations can be changed in real-time (e.g. ControlRate properties). These changes do not need to go through verision control and can be made by authorized admins on the fly. For authorization policies, see: https://community.hortonworks.com/articles/60842/hdf-20-defining-nifi-policies-in-ranger.html Shoutouts and links Many thanks to NiFi SMEs for validation of technical aspects of reuse ideas, particularly @Andy LoPresto, @Andrew Grande, @Andrew Psaltis, @Koji Kamimura, @Matt Burgess. Useful links: https://nifi.apache.org/docs/nifi-docs/ https://nifi.apache.org/docs/nifi-docs/html/getting-started.html https://community.hortonworks.com/content/kbentry/16461/nifi-understanding-how-to-use-process-groups-and-r.html https://community.hortonworks.com/articles/57304/supporting-custom-properties-for-expression-langua.html https://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.0/bk_user-guide/content/Using_Custom_Properties.html

Online	Offline
Last Visited	‎06-11-2019 01:24 AM

Member Since	‎06-20-2016 01:29 PM
Last Visited	‎06-11-2019 01:24 AM
Posts	488
Kudos received	430

Cloudera Community

Re: DR for hadoop

Re: API + how to know by API command all machines ...

Re: Does data get copied in edge node from externa...

Re: is it possible to set the hadoop.tmp.dir value...

Re: How to handle nulls when exporting from Hive?

Re: Hive CAST functions return NULL values:

Re: Auto tagging Atlas REST API

Re: Nifi workflow version control & deployment

Re: ñ character in pig error

Re: ñ character in pig error

Re: ñ character in pig error

Re: where to install hive pig oozie and ranger on ...

Re: where to install hive pig oozie and ranger on ...

Re: Service Installation Order

Enterprise NiFi: Implementing Reusable Components ...