Member since
07-30-2019
3467
Posts
1641
Kudos Received
1016
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 191 | 05-04-2026 05:20 AM | |
| 450 | 03-23-2026 05:44 AM | |
| 341 | 02-18-2026 09:59 AM | |
| 590 | 01-27-2026 12:46 PM | |
| 1025 | 01-20-2026 05:42 AM |
02-21-2019
10:39 PM
@Matt Clarke: Thank you for your detailed explanation!
... View more
02-19-2019
06:05 PM
Thanks @Matt Clarke. I went with the option 2 and it worked. Thanks again for the quick reply.
... View more
02-06-2019
03:53 PM
Thanks for your response Matt. It is working now with updateattribute processor as it's attribute level. Thumbs up to you
... View more
11-12-2018
06:08 PM
3 Kudos
NiFi Restricted components are those processors, controller services, or reporting tasks that have the ability to run user-defined code or access/alter localhost filesystem data. - The NiFi User guide explains this as follows: ----------------------------------------- Restricted components will be marked with a icon next to their name. These are components that can be used to execute arbitrary unsanitized code provided by the operator through the NiFi REST API/UI or can be used to obtain or alter data on the NiFi host system using the NiFi OS credentials. These components could be used by an otherwise authorized NiFi user to go beyond the intended use of the application, escalate privilege, or could expose data about the internals of the NiFi process or the host system. All of these capabilities should be considered privileged, and admins should be aware of these capabilities and explicitly enable them for a subset of trusted users. Before a user is allowed to create and modify restricted components they must be granted access. ------------------------------------------ Users can only be restricted from adding such components in NiFi if NiFi has to be secured. Users of an unsecured NiFi will always have access to all components. - Prior to HDF 3.2 or Apache NiFi 1.6, all restricted components were covered by a single authorization policy: Ranger Policy (Base policies): NiFi Policies (Hamburger menu) Ranger permissions description: /restricted-components Access restricted components Read/View - N/A Write/Modify - Gives granted users the ability to add components to the canvas that are tagged as “restricted” - It was decided that lumping all components into one policy was not ideal. So NIFI-4885 was created to address this so that users' access to restricted components would be based on the level of restricted access they are being granted. read-filesystem read-distributed-filesystem write-filesystem write-distributed-filesystem execute=code access-keytab export-nifi-details - In order to avoid backward compatibility issues when users upgrade to a HDF 3.2+ or Apache NiFi 1.6.0+, the “Access restricted components” base policy still exists and defaults to "regardless of restrictions". In the NiFi global “Access Policies” UI, this is the default policy and is depicted as follows: In Ranger, this is still associated with just the “/restricted-components” policy. The four new policies are depicted as follows in Ranger and NiFi UIs: - Ranger Policy (Base policies): NiFi Policies (Hamburger menu) Ranger permissions description: /restricted-components/read-filesystem Access restricted componentsSub policy:Requiring ‘read filesystem’ Read/View - N/A Write/Modify - Allows users to create/modify restricted components requiring read filesystem. /restricted-components/read-distributed-filesystem Access restricted componentsSub policy:Requiring ‘read distributed filesystem’ Read/View - N/A Write/Modify - Allows users to create/modify restricted components requiring read distributed filesystem. /restricted-components/write-filesystem Access restricted componentsSub policy:Requiring ‘write filesystem’ Read/View - N/A Write/Modify - Allows users to create/modify restricted components requiring write filesystem. /restricted-components/write-distributed-filesystem Access restricted componentsSub policy:Requiring ‘write distributed filesystem’ Read/View - N/A Write/Modify - Allows users to create/modify restricted components requiring write distributed filesystem. /restricted-components/execute-code Access restricted componentsSub policy:Requiring ‘execute code’ Read/View - N/A Write/Modify - Allows users to create/modify restricted components requiring read filesystem. /restricted-components/access-keytab Access restricted components Sub policy:Requiring ‘access keytab’ Read/View - N/A Write/Modify - Allows users to create/modify restricted components requiring read filesystem. /restricted-components/export-nifi-details Access restricted components Sub policy:Requiring ‘export nifi details’ Read/View - N/A Write/Modify - Allows users to create/modify restricted components requiring read filesystem. - Below is a list of restricted components for each of the above sub-policies (current as of CFM 2.1.1 and Apache NiFi 1.13): Read-filesystem: NiFi component: Component type: Access provisions: FetchFile Processor Provides operator the ability to read from any file that NiFi has access to. TailFile Processor Provides operator the ability to read from any file that NiFi has access to. GetFile Processor Provides operator the ability to read from any file that NiFi has access to. - Read-Distributed-Filesystem: (Added NiFi 1.13) NiFi component: Component type: Access provisions: FetchHDFS Processor Provides operator the ability to retrieve any file that NiFi has access to in HDFS or the local filesystem. FetchParquet Processor Provides operator the ability to retrieve any file that NiFi has access to in HDFS or the local filesystem. GetHDFS Processor Provides operator the ability to retrieve any file that NiFi has access to in HDFS or the local filesystem. GetHDFSSequenceFile Processor Provides operator the ability to retrieve any file that NiFi has access to in HDFS or the local filesystem. MoveHDFS Processor Provides operator the ability to retrieve any file that NiFi has access to in HDFS or the local filesystem. - Write-filesystem: NiFi component: Component type: Access provisions: FetchFile Processor Provides operator the ability to delete any file that NiFi has access to. GetFile Processor Provides operator the ability to delete any file that NiFi has access to. PutFile Processor Provides operator the ability to write to any file that NiFi has access to. - Write-Distributed-Filesystem: (Added NiFi 1.13) NiFi component: Component type: Access provisions: DeleteHDFS Processor Provides operator the ability to delete any file that NiFi has access to in HDFS or the local filesystem. GetHDFS Processor Provides operator the ability to delete any file that NiFi has access to in HDFS or the local filesystem. GetHDFSSequenceFile Processor Provides operator the ability to delete any file that NiFi has access to in HDFS or the local filesystem. MoveHDFS Processor Provides operator the ability to delete any file that NiFi has access to in HDFS or the local filesystem. PutHDFS Processor Provides operator the ability to delete any file that NiFi has access to in HDFS or the local filesystem. PutParquet Processor Provides operator the ability to write any file that NiFi has access to in HDFS or the local filesystem. - Execute-code: NiFi component: Component type: Access provisions: ScriptedReportingTask Reporting Task Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. ScriptedLookupService Controller Service Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. ScriptedReader Controller Service Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. ScriptedRecordSetWriter Controller Service Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. ExecuteFlumeSink Processor Provides operator the ability to execute arbitrary Flume configurations assuming all permissions that NiFi has. ExecuteFlumeSource Processor Provides operator the ability to execute arbitrary Flume configurations assuming all permissions that NiFi has. ExecuteGroovyScript Processor Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. ExecuteProcess Processor Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. ExecuteScript Processor Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. ExecuteStreamCommand Processor Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. invokeScriptedProcessor Processor Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. - access-keytab: NiFi component: Component type: Access provisions: KeytabCredentialsService Controller Service Allows user to define a Keytab and principal that can then be used by other components. - Export-nifi-details: NiFi component: Component type: Access provisions: SiteToSiteBulletinReportingTask Reporting Task Provides operator the ability to send sensitive details contained in bulletin events to any external system. SiteToSiteProvenanceReportingTask Reporting Task Provides operator the ability to send sensitive details contained in Provenance events to any external system. - ***Note: Some components may be found under multiple sub-policies above. In order for a user to utilize that component, they must be granted access to every sub policy required by that component. - Exceptions in HDF 3.2 and Apache 1.7 and 1.8: In order to use the following components, users must have full access to all restricted components policies: NiFi component: Component type: Access provisions: PutORC Processor This component requires access to restricted components regardless of restriction. Apache Jira: NIFI-5815 - A full breakdown of all other NiFi Policies can be found here: NiFi Ranger based policy descriptions - Cloudera Community
... View more
Labels:
09-12-2018
12:50 PM
4 Kudos
# Max Timer Driven Thread Count and Max Event Driven Thread Count: - Out of the box NiFi sets the Max Timer Thread Counts relatively low to support operating on the simplest of hardware. This default setting can limit the performance of very large high volume dataflow that must perform a lot of concurrent processing. General guidance for setting this value is 2 - 4 times the number of cores available to the hardware on which the NiFi service is running. With a NiFi cluster where each server has different hardware (not recommended), this would be set to the highest value possible based on the server with the fewest cores. NOTE: Remember that all configurations you apply within the NIFi UI are applied to every node in a NiFi cluster. None of the settings apply as a total to the cluster itself. NOTE: The cluster UI can be used to see how the total active threads are being used per node. Closely monitoring system CPU usage over time on each of your cluster nodes will help you identify regular or routine spikes in usage. This information will help you identify if you can increase the “Maximum Timer Driven Thread Count” setting even higher. Just arbitrarily setting this value higher can lead to thread spending excessive time in CPU wait and not really doing any work. This can show as long tasks times reported in processors. - *** The Event Driven scheduling strategy is considered experimental and thus do not recommend that it is used at all. User should only be configuring their NiFi processors to use one of the Timer Driven scheduling strategies (Timer Driven or CRON Driven). - # Assigning Concurrent Tasks to processor components: - Concurrent task settings on processors should always start at default 1 and only be incremented slowly as needed. Assigning too many concurrent tasks to every processor can have an affect on other dataflows/processors. - Because of how the above works, it may appear to a user that they get better performance out of a processor by simply setting a high number of concurrent tasks. What they are really doing is simply stacking up more request in that large queue so the processor gets more opportunity to grab one of the available threads from the resource pool. What often happens is users with a processor only running with 1 concurrent task are affected (simply because of the size of the request queue). So that user increases their concurrent tasks. Before you know it the request queue is so large, no one is benefiting from assigning additional concurrent tasks. - In addition you may have processors that inherently have long running tasks. Assigning these processors lots of concurrent tasks can mean a substantial chunk of that thread pool is being used for an extended amount of time. This then limits the number of available threads from the pool that are trying to work through the remaining tasks in the queue.
... View more
Labels:
09-12-2018
12:30 PM
7 Kudos
# Processor Run Duration: Some processors support configuring a run duration. This setting tells a processor to continue to use the same task to work on as many FlowFiles (or batches of flowfiles) from an incoming queue in a single task. This is ideal for processors where the individual tasks themselves are completed very fast and the volume of FlowFile are large as well. In the above example, the exact same feed of FlowFiles were passed to both these processors which are configured to perform the same Attribute updated. Both processed the same number of FlowFiles in the past 5 minutes; however, the processor configured with a run duration consumed less overall CPU time to do so. Not all processors support setting a run duration. The nature of the processor function, the methods being used, and/or client lib used may not support this capability. You will not be able to set a run duration on such processors. How this works: Processor has thread assigned to its task. Processor grabs highest priority FlowFile (or batch of FlowFiles) from the “active queue” of the incoming connection. If processing of the FlowFile(s) does not exceed the configured run duration, another FlowFile (Flowfile batch) is pulled from the active queue. This process continues all under that same thread until run duration has been reached or “Active queue” is empty. At that time the session is completed and all outbound FlowFiles are committed at once to the appropriate relationship. Since no FlowFiles are committed until the entire run completes, Some latency is introduced on the FlowFiles. Your configured run duration dictates how much latency will occur at a minimum. If the execution of the processor against a FlowFile takes longer then the configured "run duration", there is no added benefit of adjusting this configuration. What this means for heap usage: Since it is only processing incoming FlowFiles in the “Active queue” there is no added heap pressure here. (FlowFiles in “active queue “ are already in heap space). The FlowFiles being generated (if any, depending on processor function) are all held in heap until the final commit. This may introduce some additional heap pressure versus not using a run duration since all those new FlowFiles being generated will be held in heap until they are all commited to an output relatiosnhip at the end of the run duration.
... View more
Labels:
11-21-2018
03:59 PM
Thanks for your answer, I wanted to have only "one" queue were all flowfiles would be waiting.I know now that it was i bad idea => I reduced the size of the queue and now use backpresure. It corrected the priority problem. Thanks again !
... View more
04-11-2018
05:46 AM
Thanks for the solution, but since i am not familiar with rest api, solution by Matt looks easy to me. Will surely try yours one too.
... View more
02-07-2018
02:52 PM
@Felix Albani Thank you for your feedback... I have made the correction.
... View more
06-26-2017
04:43 PM
7 Kudos
The NiFi S2S protocol is used by NiFi's Remote Process Group (RPG) components to distribute FlowFiles from one NiFi instance to another. When the target NiFi is a NiFi cluster, load-balancing of the FlowFie delivery is done across all nodes in the target NiFi cluster.
The default way this works (and the only way it works in versions of NiFi previous to Apache NiFi 1.2.0 or HDF 3.0) is as follows:
The RPG regularly communicates with the target NiFi cluster to get load status information about each node in the cluster. This information includes the number of currently connected nodes in the target cluster, each node's hostname, port information, and the number of total queued FlowFiles on each target NiFi node.
The Source NiFi uses this information to determine a data distribution strategy for its source FlowFiles it has queued. - - - Let's assume a 4 node target NiFi cluster all reporting a zero queue count. - Each node will then be scheduled to receive 25% of the data. This means a distribution pattern of node1, node2, node3, and then, node4. - Now let's assume the same 4 node target cluster; however, node 1 and node 2 report having a queue of FlowFiles that results in the following: - Node 1 and Node 2 would get 16.67% of the data while node 3 and node 4 get 33.33% of the data. This results in a distribution pattern of node1, node2, node3, node4, node3, and then node 4. So Nodes 3 and 4 get twice the opportunity to receive data over nodes 1 and 2.
Once the distribution pattern is determined, the RPG connects to the first node and starts transferring data from the incoming queue to that node for 500 milliseconds or until the queue is empty. The next run of RPG will start sending to the next node in pattern and so on.
As you can see by this default distribution model, the data may not always be distributed as desired. The reason this transfer was implemented this way was for performance reasons. However, when working with very small FlowFiles, where FlowFiles come in a wide range of sizes from small to large, when a better network connection exists between one target node than another, or data comes in bursts instead of continuous flow, the load-balancing will be less than ideal.
With the introduction of HDF 3.0 (Apache NiFi 1.2.0). additional configuration options were added to the RPG to control the number of FlowFiles (count), amount of data (size), and/or length of transaction time (duration) per RPG port connection. This gives users the ability to fine-tune their RPG connection to achieve better load-balancing results when dealing with lighter volume dataflows, network performance differences between nodes, etc.
These new configuration options can be set as follows:
Each input and output port configuration will need to be set individually.
Of course, setting count to a value of 1 sounds like a good way to achieve really good load-balancing, but it will cost you in performance since only one FlowFile will be sent in each transaction. So, there will be extra overhead introduced due to the volume of new connections being opened and closed. So you may find yourself playing around with these settings to achieve your desired load-balancing to performance ratio.
----------------
How do I get better load-balancing in an older version of NiFi?
The RPG will send data based on what is currently in the incoming queue per transaction. By limiting the size of that queue, you can control the max number of FlowFiles that will transfer per transaction. You can set the size of object back pressure thresholds on those incoming queues to limit the number of FlowFiles queued at any given time. This will cause FlowFiles to queue on the next upstream connection in the dataflow. If a source processor is feeding the RPG directly, try putting an updateAttribute processor between that source processor and the RPG so you have two connections. As each RPG execution runs and transfers what is on the queue, the queue will be refilled for the next transaction. Apache NiFi 1.13+ update: In newer releases of NiFi, the ability to redistribute FlowFiles within the cluster was made much more efficient and easier through the new load-balanced connection feature. This new feature (stable in Apache NIFi 1.13+ versions) is a simple configuration change that can be done on a connection. It supports numerous strategies for redistribution of FlowFiles, but for load-balanced distribution, it offers a true round-robin capability you can't get from an RPG.
... View more
Labels:
- « Previous
- Next »