About MattWho

MattWho · ‎08-01-2017

@Foivos A The banner is a NiFi core feature and is not tied in anyway to the dataflows you select or build on your canvas. You are correct that the best approach for identifying which dataflows on a single canvas are designated dev, test, or production is through the use of "labels". In a secure NiFi setup, you can use NiFi granular multi-tenancy user authorization to control what components a user can interact with an view. If you use labels, you should set a policy allowing all user to view that specific component, so even if they are not authorized to access the labeled components, they will be able to see why via the label text. Thanks, Matt

mark_hadoop · ‎08-02-2017

@Matt Clarke I will start a new question. Thanks

MattWho · ‎02-07-2018

@Felix Albani Thank you for your feedback... I have made the correction.

akash_sabarad · ‎07-16-2017

Thank you Matt, ListHDFS was a good hint. I was able to accomplish my task with you inputs.

MattWho · ‎07-07-2017

@Mark Heydenrych You may be able to use the ReplaceText processor to remove those blank lines from your input FlowFile's content before the SplitText processor. I did a little test that worked for me using the following configuration: This evaluates your FlowFile line by line and replace the line return (\n) on any line where the line starts with a line return with nothing. The effectively removes that blank line. After that my splitText reported teh correct fragment.count when I split the file. Thanks, Matt

MattWho · ‎06-26-2017

The NiFi S2S protocol is used by NiFi's Remote Process Group (RPG) components to distribute FlowFiles from one NiFi instance to another. When the target NiFi is a NiFi cluster, load-balancing of the FlowFie delivery is done across all nodes in the target NiFi cluster. The default way this works (and the only way it works in versions of NiFi previous to Apache NiFi 1.2.0 or HDF 3.0) is as follows: The RPG regularly communicates with the target NiFi cluster to get load status information about each node in the cluster. This information includes the number of currently connected nodes in the target cluster, each node's hostname, port information, and the number of total queued FlowFiles on each target NiFi node. The Source NiFi uses this information to determine a data distribution strategy for its source FlowFiles it has queued. - - - Let's assume a 4 node target NiFi cluster all reporting a zero queue count. - Each node will then be scheduled to receive 25% of the data. This means a distribution pattern of node1, node2, node3, and then, node4. - Now let's assume the same 4 node target cluster; however, node 1 and node 2 report having a queue of FlowFiles that results in the following: - Node 1 and Node 2 would get 16.67% of the data while node 3 and node 4 get 33.33% of the data. This results in a distribution pattern of node1, node2, node3, node4, node3, and then node 4. So Nodes 3 and 4 get twice the opportunity to receive data over nodes 1 and 2. Once the distribution pattern is determined, the RPG connects to the first node and starts transferring data from the incoming queue to that node for 500 milliseconds or until the queue is empty. The next run of RPG will start sending to the next node in pattern and so on. As you can see by this default distribution model, the data may not always be distributed as desired. The reason this transfer was implemented this way was for performance reasons. However, when working with very small FlowFiles, where FlowFiles come in a wide range of sizes from small to large, when a better network connection exists between one target node than another, or data comes in bursts instead of continuous flow, the load-balancing will be less than ideal. With the introduction of HDF 3.0 (Apache NiFi 1.2.0). additional configuration options were added to the RPG to control the number of FlowFiles (count), amount of data (size), and/or length of transaction time (duration) per RPG port connection. This gives users the ability to fine-tune their RPG connection to achieve better load-balancing results when dealing with lighter volume dataflows, network performance differences between nodes, etc. These new configuration options can be set as follows: Each input and output port configuration will need to be set individually. Of course, setting count to a value of 1 sounds like a good way to achieve really good load-balancing, but it will cost you in performance since only one FlowFile will be sent in each transaction. So, there will be extra overhead introduced due to the volume of new connections being opened and closed. So you may find yourself playing around with these settings to achieve your desired load-balancing to performance ratio. ---------------- How do I get better load-balancing in an older version of NiFi? The RPG will send data based on what is currently in the incoming queue per transaction. By limiting the size of that queue, you can control the max number of FlowFiles that will transfer per transaction. You can set the size of object back pressure thresholds on those incoming queues to limit the number of FlowFiles queued at any given time. This will cause FlowFiles to queue on the next upstream connection in the dataflow. If a source processor is feeding the RPG directly, try putting an updateAttribute processor between that source processor and the RPG so you have two connections. As each RPG execution runs and transfers what is on the queue, the queue will be refilled for the next transaction. Apache NiFi 1.13+ update: In newer releases of NiFi, the ability to redistribute FlowFiles within the cluster was made much more efficient and easier through the new load-balanced connection feature. This new feature (stable in Apache NIFi 1.13+ versions) is a simple configuration change that can be done on a connection. It supports numerous strategies for redistribution of FlowFiles, but for load-balanced distribution, it offers a true round-robin capability you can't get from an RPG.

tarun_kumar1 · ‎12-18-2017

Thank You Matt, I too was facing similar issue and your suggestion worked.

estefania_rabad · ‎06-14-2017

thanks for the quick reply matt, i'll use that solution!

MattWho · ‎06-13-2017

@forest lin NiFi at is core has no issues working with very large files. Often times, when you run into OOM it is because of what you are trying to do with those very large files after they are in NiFi. In the majority of the cases OOM can be avoided via dataflow design and tweaks to the heap size allocated to the NiFi JVM. The content of a FlowFile does not live in heap memory space, but the FlowFile attributes do (*** except when swapped out to disk in large queues). So avoid extracting large amounts of the content into FlowFile attributes, avoid trying to split very large files in to large numbers of small FlowFiles using a single processor, avoid trying to merge a very large number of FlowFiles in to a single FlowFile, etc... You can still do these types of things but may need to do it in two stages rather then one. For example Splitting large files by every 5000 lines first and then split 5000 line FlowFiles by every line (Huge difference in heap usage). If you found this answer addressed your question, please mark it as accepted to close out this thread. Thanks, Matt

MattWho · ‎05-31-2017

No problem... as soon as you added that you were running HDF 2.1.2, it helped.

Online	Offline
Last Visited	‎07-28-2026 01:05 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎07-28-2026 01:05 AM
Posts	3,472
Kudos received	1638

Cloudera Community

Re: ListenNetFlow processor does not decode Cisco ...

Re: Can we detect who did a particular operation i...

Re: How to invoke a url in nifi which is protected...

Re: Retry impacts scheduler

Re: 503 error while copying/versioning big process...

Re: Ways to distinguish NiFi UI canvas by environm...

Re: Extract text using Nifi

Re: NiFi Ranger based policy descriptions

Re: FetchHDFS Process to fetch Nested data in HDFS

Re: Incorrect fragment.count in nifi

How to achieve better load-balancing using NiFi's ...

Re: Files detected twice with ListFile processor

Re: Is there a way to avoid the RouteOnAttribute.R...

Re: ControlRate and BackPressure seems not work fo...

Re: NiFi clock is off by one hour (daylight saving...