Member since
06-20-2016
488
Posts
433
Kudos Received
118
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3605 | 08-25-2017 03:09 PM | |
| 2517 | 08-22-2017 06:52 PM | |
| 4197 | 08-09-2017 01:10 PM | |
| 8977 | 08-04-2017 02:34 PM | |
| 8950 | 08-01-2017 11:35 AM |
11-09-2016
02:23 PM
This post will show you how to very quickly (seconds) build a flow that logs nifi at the processor and error level (or however you wish to customize). Very handy for development and production as well: https://community.hortonworks.com/articles/65027/nifi-easy-custom-logging-of-diverse-sources-in-mer.html
... View more
11-08-2016
03:52 PM
Thank you @jpercivall I placed it in the list. Great ref for the next level of detail in apache nifi docs!
... View more
11-08-2016
02:31 PM
Glad it helped 🙂 (If you want to get more reuse from nifi, see first link in answer for reusing templates during development).
... View more
11-08-2016
01:23 PM
1 Kudo
HDP 2.3+ packages Sqoop 1.4.6 which allows direct import to HDFS as parquet file, by using: --as-parquetfile See: https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html If you import directly to hive table (vs HDFS) you may need to do this as 2-step process (https://community.hortonworks.com/questions/56847/parquet-files-sqoop-import.html)
... View more
11-08-2016
01:10 PM
4 Kudos
Here are useful links (starting from general to more technical):
http://hortonworks.com/apache/nifi https://nifi.apache.org/docs/nifi-docs http://hortonworks.com/hadoop-tutorial/learning-ropes-apache-nifi https://nifi.apache.org/docs/nifi-docs/html/getting-started.html
https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html https://community.hortonworks.com/articles/7999/apache-nifi-part-1-introduction.html
https://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.0/bk_overview/content/ch_overview.html
https://cwiki.apache.org/confluence/display/NIFI/NiFi+Architecture
https://community.hortonworks.com/questions/42174/how-nifi-works-internally.html https://community.hortonworks.com/articles/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html https://community.hortonworks.com/questions/59707/nifi-jvm-not-releasing-memory.html
... View more
11-08-2016
12:49 PM
4 Kudos
There are 2 ways to do it: 1) Custom properties file
put name=value pairs in custom property file: parent_path=/sourcesystem/country/ put custom property file in a location you choose on the nifi cluster open the existing nifi.properties file for property nifi.variable.registry.properties, set it to the path to your custom property file (or a csv of a list of custom property files) in your processor, use ${parent_path} as you indicate 2) OS environment variable (best for sensitive values not to be exposed in a file)
set the OS environment variable: export Auth_key=234234fsdaf234 reference ${Auth_key} in your processor The following links show more details https://community.hortonworks.com/articles/60868/enterprise-nifi-implementing-reusable-components-a.html https://community.hortonworks.com/articles/57304/supporting-custom-properties-for-expression-langua.html
... View more
11-07-2016
02:26 PM
If you feel like you have everything you need, let me know by accepting the answer; else, good to wait for additional answers or follow up with additional questions.
... View more
11-07-2016
01:53 PM
As mentioned in previous comment -- you should only store files in local file system of edge node. You should never use the actual cluster (master and data nodes) for local file storage. The fuller answer gives the benefit of HDFS if you are worried about automatic backup of files. (I have seen edge nodes go down and everything lost; thus, either have automatic backup or go to hdfs for files you want to backup.)
... View more
11-07-2016
01:51 PM
If you do anything with the linux file system, it should be on edge node only. See fuller answer below.
... View more
11-07-2016
01:33 PM
1 Kudo
1. Never use master or data node local storage Best practice is definitely not to touch the master nodes or data nodes for local filesystem storage or command line interface (use edge node CLI or local machine via Ambari Views or integration through Knox gateway). 2. 3rd party tools 3rd party tools will specify where to locate their files/jars. 3. Edge node If you need files (typically jars) for client interface to cluster, place on edge node and use client there. If you simply want to archive files (e.g. POC work) you can do this on the edge node local file system. 4. HDFS If you are archiving files on the edge node and it does not have high availability or backup (e.g. autoreplication of mounts) and you want this, putting it into HDFS is a good idea since each is replicated 3x. When putting into HDFS, from a client perspective there is no specification of name node or data node -- you interact with the namenode and it will store it on the data nodes. The name node is your interface with the data nodes. In HDFS, you could define a path like /misc and store these files there. You can also manage read-write permissions on this folder. You can manage files (make dir, put file, get file) in hdfs through the command line (edge node is good host for this) or Ambari file view. See: http://hortonworks.com/hadoop-tutorial/using-commandline-manage-files-hdfs/ http://hortonworks.com/blog/best-practices-in-hdfs-authorization-with-apache-ranger/
... View more