Member since
07-30-2019
3387
Posts
1617
Kudos Received
999
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 82 | 11-05-2025 11:01 AM | |
| 347 | 10-20-2025 06:29 AM | |
| 487 | 10-10-2025 08:03 AM | |
| 348 | 10-08-2025 10:52 AM | |
| 378 | 10-08-2025 10:36 AM |
09-12-2016
12:40 PM
3 Kudos
@spdvnz NiFi's Hadoop based processors already include the Hadoop client libraries so there is no need to install them outside of NiFi or install NiFi on the same hardware where Hadoop is running. The various NiFi processors for communicating with Hadoop use the core-site.xml, hdfs-site.xml, and/or hbase-site.xml files as part of their configuration. These files would need to be copied from you Hadoop system(s) to a local directory on each of your NiFi instances for use by these processors. Detailed processor documentation is provided by clicking on "help" in the upper right corner within the NiFi UI. You can also get to the processor documentation by right clicking on a processor and slecting "usage" from teh context menu that is displayed. Thanks, Matt
... View more
09-12-2016
12:15 PM
3 Kudos
@spdvnz Check out this other article:
https://community.hortonworks.com/articles/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html
There is no difference between how a Node in a NiFi cluster and how a Standalone NiFi should be setup. BOth should follow the guidelines outlined in the above article. As of NiFi 1.x and HDF 2.x, a NiFi cluster no longer has a NiFi Cluster Manager (NCM) and therefore all systems would be setup the same. For NiFi 0.x and HDF 1.x versions the NCM does not process any data and therefore does not need the content repos, FlowFile repo or provenance repos. The NCM also does not require the same CPU horse power as the Nodes. The NCM can have a significant memory requirement depending on the number of attached nodes and the amount of processors added to the canvas. This is because all the processor and connection stats are reported to the NCM in heartbeats and stored in memory. Thanks, Matt
... View more
09-06-2016
04:45 PM
1 Kudo
@INDRANIL ROY The output from the SplitText and RouteText processors is a bunch of FlowFiles all with the same filename (filename of the original FlowFile they were derived from.) NiFi differentiates these FlowFiles by assigning each a Unique Identifier (UUID). The problem you have is then writing to HDFS only the first FlowFile written with a particular filename is successful. All others result in the error you are seeing. The MergeContent processor you added reduces the impact but does not solve your problem. Remember that nodes do not talk to one another or share files with one another. So each MergeContent is working on its own set of files all derived from the same original source file and each node is producing its own merged file with the same filename. The first node to successfully write its file HDFS wins and the other nodes throw the error you are seeing. What is typically done here is to add an UpdateAttribute processor after each of your MergeContent processor to force a unique name on each of the FlowFiles before writing to HDFS. The uuid that NiFi assigns to each of these FlowFiles is often prepended or appended to the filename to solve this problem: If you do not want to merge the FlowFiles, you can simply just add the UpdateAttribute processor in its place. YOu will just end up with a larger number of files written to HDFS.
Thanks, Matt
... View more
09-06-2016
03:02 PM
@INDRANIL ROY Your approach above looks good except you really want to split that large 50,000,000 line file in to many more smaller files. Your example shows you only splitting it in to 10 files which may not ensure good file distribution to the downstream NiFi cluster nodes. The RPG load balances batches of files (up to 100 at a time) for speed and efficiency purposes. With so few files it is likely that every file will still end up on the same downstream node instead of load balanced. However if you were to split the source file in to ~5,000 files, you would achieve much better load-balancing. Thanks, Matt
... View more
09-06-2016
02:03 PM
1 Kudo
@INDRANIL ROY
You have a couple things going on here that are affecting your performance. Based on previous HCC discussions you have a single 50,000,000 line file you are splitting in to 10 files (Each 5,000,000 lines) and then distributing those splits to your NiFi cluster via a RPG (Site-to-Site). You are then using the RouteText processor to read every line of these 5,000,000 line files and route the lines based on two conditions. 1. Most NiFi processors (including RouteText) are multi-thread capable by adding additional concurrent tasks. A single concurrent task can work on a single file or batch of files. Multiple threads will not work on the same file. So by setting your current tasks to 10 on the RouteText you may not actually be using 10. The NiFi controller also has a max number of threads configuration that limits the number of threads available across all components. The max thread setting can be found by clicking on this icon in the upper right corner of the UI. Most components by default use timer driven threads, so this is the number you will want to increase in most cases. Now keep in mind that your hardware also limits how much "work" you can do concurrently. With only 4 cores, you are fairly limited. You may want to up this value from the default 10 to perhaps 20. You can just end up with a lot of threads in cpu wait. Avoid getting carried away on your thread allocations (Both at the controller level and processor level). 2. In oder to get better multi-thread throughput on your RouteText processor, try splitting your incoming fie in to many smaller files. Try splitting your 50,000,000 line file in to files with no more then 10,000 lines each. The resulting 5,000 files will be better distributed across your NiFi cluster Nodes and allow the multiple threads to be utilized. Thanks, Matt
... View more
09-06-2016
12:29 PM
1 Kudo
@Bojan Kostic
It is not currently possible to add new jars /nars to a running NiFi. A restart is always required to get these newly added items loaded. Upon NiFi startup all the jars/nars are unpacked in to the NiFi work directory. To maintain high availability it is recommended that you use a NiFi cluster. This will allow you to do rolling restarts so that your entire cluster is not down at the same time. If adding new components as part of this rolling update, you will not be able to use those new components until all nodes have been updated. Thanks, Matt
... View more
09-06-2016
12:18 PM
2 Kudos
@David DN Before Site-to-Site (S2S) can be used the following properties must be set in the nifi.properties file on all the Nodes in your NiFi cluster: # Site to Site properties
nifi.remote.input.host=<FQDN of Host> <-- Set to resolveable FQDN by all Nodes
nifi.remote.input.secure=false <-- Set to True on if NiFi is running HTTPS
nifi.remote.input.socket.port=<Port used for S2S) <-- Needs to be set to support Raw/enable S2S
nifi.remote.input.http.enabled=true <-- Set if you want to support HTTP transport
nifi.remote.input.http.transaction.ttl=30 sec A restart of your NiFi instances will be necessary for this change to take affect.
Matt
... View more
09-02-2016
02:02 PM
@INDRANIL ROY Please share how you have your SplitText and RouteText processors configuration. If understand your end goal, you want to take this single files with 10,000,000 entries/lines and route only lines meeting criteria 1 to one putHDFS while route all other lines to another putHDFS? Thanks, Matt
... View more
08-31-2016
08:48 PM
You can also save portions or all of you dataflow in a to NiFi templates that can be exported for use on other NiFi installations. To create a template simply highlight all the components you want in your template (If you highlight a process group, all components within that process group will be added to the template). Then click on the "create template" icon in the upper middle create your template. The Templates manager UI can be used to export and import these templates from your NiFi. It can be access via this icon in the upper right corner of the NiFi UI.
*** Note: NiFi templates are sanitized of any sensitive properties values (A sensitive property value would be any value that would be encrypted. In NiFi that would be any passwords)
Matt
... View more