About gkeys

gkeys · ‎11-09-2016

This post will show you how to very quickly (seconds) build a flow that logs nifi at the processor and error level (or however you wish to customize). Very handy for development and production as well: https://community.hortonworks.com/articles/65027/nifi-easy-custom-logging-of-diverse-sources-in-mer.html

gkeys · ‎11-08-2016

Thank you @jpercivall I placed it in the list. Great ref for the next level of detail in apache nifi docs!

gkeys · ‎11-08-2016

Glad it helped 🙂 (If you want to get more reuse from nifi, see first link in answer for reusing templates during development).

gkeys · ‎11-08-2016

HDP 2.3+ packages Sqoop 1.4.6 which allows direct import to HDFS as parquet file, by using: --as-parquetfile See: https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html If you import directly to hive table (vs HDFS) you may need to do this as 2-step process (https://community.hortonworks.com/questions/56847/parquet-files-sqoop-import.html)

gkeys · ‎11-08-2016

Here are useful links (starting from general to more technical): http://hortonworks.com/apache/nifi https://nifi.apache.org/docs/nifi-docs http://hortonworks.com/hadoop-tutorial/learning-ropes-apache-nifi https://nifi.apache.org/docs/nifi-docs/html/getting-started.html https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html https://community.hortonworks.com/articles/7999/apache-nifi-part-1-introduction.html https://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.0/bk_overview/content/ch_overview.html https://cwiki.apache.org/confluence/display/NIFI/NiFi+Architecture https://community.hortonworks.com/questions/42174/how-nifi-works-internally.html https://community.hortonworks.com/articles/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html https://community.hortonworks.com/questions/59707/nifi-jvm-not-releasing-memory.html

gkeys · ‎11-08-2016

There are 2 ways to do it: 1) Custom properties file put name=value pairs in custom property file: parent_path=/sourcesystem/country/ put custom property file in a location you choose on the nifi cluster open the existing nifi.properties file for property nifi.variable.registry.properties, set it to the path to your custom property file (or a csv of a list of custom property files) in your processor, use ${parent_path} as you indicate 2) OS environment variable (best for sensitive values not to be exposed in a file) set the OS environment variable: export Auth_key=234234fsdaf234 reference ${Auth_key} in your processor The following links show more details https://community.hortonworks.com/articles/60868/enterprise-nifi-implementing-reusable-components-a.html https://community.hortonworks.com/articles/57304/supporting-custom-properties-for-expression-langua.html

gkeys · ‎11-07-2016

If you feel like you have everything you need, let me know by accepting the answer; else, good to wait for additional answers or follow up with additional questions.

gkeys · ‎11-07-2016

As mentioned in previous comment -- you should only store files in local file system of edge node. You should never use the actual cluster (master and data nodes) for local file storage. The fuller answer gives the benefit of HDFS if you are worried about automatic backup of files. (I have seen edge nodes go down and everything lost; thus, either have automatic backup or go to hdfs for files you want to backup.)

gkeys · ‎11-07-2016

If you do anything with the linux file system, it should be on edge node only. See fuller answer below.

gkeys · ‎11-07-2016

1. Never use master or data node local storage Best practice is definitely not to touch the master nodes or data nodes for local filesystem storage or command line interface (use edge node CLI or local machine via Ambari Views or integration through Knox gateway). 2. 3rd party tools 3rd party tools will specify where to locate their files/jars. 3. Edge node If you need files (typically jars) for client interface to cluster, place on edge node and use client there. If you simply want to archive files (e.g. POC work) you can do this on the edge node local file system. 4. HDFS If you are archiving files on the edge node and it does not have high availability or backup (e.g. autoreplication of mounts) and you want this, putting it into HDFS is a good idea since each is replicated 3x. When putting into HDFS, from a client perspective there is no specification of name node or data node -- you interact with the namenode and it will store it on the data nodes. The name node is your interface with the data nodes. In HDFS, you could define a path like /misc and store these files there. You can also manage read-write permissions on this folder. You can manage files (make dir, put file, get file) in hdfs through the command line (edge node is good host for this) or Ambari file view. See: http://hortonworks.com/hadoop-tutorial/using-commandline-manage-files-hdfs/ http://hortonworks.com/blog/best-practices-in-hdfs-authorization-with-apache-ranger/

Online	Offline
Last Visited	‎06-11-2019 01:24 AM

Member Since	‎06-20-2016 01:29 PM
Last Visited	‎06-11-2019 01:24 AM
Posts	488
Kudos received	430

Cloudera Community

Re: DR for hadoop

Re: API + how to know by API command all machines ...

Re: Does data get copied in edge node from externa...

Re: is it possible to set the hadoop.tmp.dir value...

Re: How to handle nulls when exporting from Hive?

Re: Nifi - How to capture a seperate log file for ...

Re: can someone help me with Apache nifi's basic w...

Re: how to keep Environmental property file in cac...

Re: Convert to parquet format

Re: can someone help me with Apache nifi's basic w...

Re: how to keep Environmental property file in cac...

Re: Best pratice for lacation file in cluster HDP

Re: Best pratice for lacation file in cluster HDP

Re: Best pratice for lacation file in cluster HDP

Re: Best pratice for lacation file in cluster HDP