Member since
07-30-2019
3471
Posts
1642
Kudos Received
1020
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 150 | 06-03-2026 06:06 PM | |
| 460 | 05-06-2026 09:16 AM | |
| 829 | 05-04-2026 05:20 AM | |
| 497 | 05-01-2026 10:15 AM | |
| 622 | 03-23-2026 05:44 AM |
11-13-2018
06:50 PM
@Félicien Catherin Your observations are correct. The prioritizer only works against the FlowFiles currently in the "Active" queue. Because this active queue resides in JVM heap, the reordering based on priority comes at very little expense to performance. Re-evaluating all swapped FlowFiles each time a new FlowFile enters a connection would have a hit to throughput performance. - The first question I would ask is why the queue is so large? - Increasing the configured swap threshold in nifi.properties file will allow more FlowFiles to be held in the active queue, but that comes at an expense of high heap usage by your NiFi. - Keep in mind that the best throughput will always be obtained when no prioritizers are defined on a connection.
... View more
11-13-2018
03:34 PM
@Poonam Mishra - When using Site-To-Site (S2S) between two secured NiFi instances a 2-way TLS handshake occurs. Based on the specific error you are getting, this handshake appears to have been successful. This means your issue purely falls on the Authorization side. - The NiFi instance running the SiteToSiteProvenanceReportingTask is acting as the client in this connection to your other NiFi instance acting as the Server side of the connection. So it is the Server side NiFi that is checking what authorizations have been granted for the client NiFi instance(s) (If source is a NiFi cluster, every node will need to be authorized). - The full DN from the PrivateKeyEntry found in your keystore on the client NIFi will be used. If your target/Server side NiFi has any configured Identity.mapping.patterns defined in its nifi.properties file, then that full DN will be evaluated against them to see fi any match. If a match is found, then resulting mapping value is passed to the authorizer; otherwise, full DN will be passed to authorizer. - You have not shared what authorizer you are using, but there must exist user entries for every client NiFi who is connecting. For S2S, these client strings passed to the authorizer (case sensitive) must be granted the following policies: 1. "retrieve site-to-site details" <-- This policy allows the client NiFi (one with reporting task) to retrieve details about the target which includes details on supported transport protocols, number of nodes, RAW port values, Node hostnames, node load, etc...). 2. "receive data via site-to-site" <-- This policy allows the authorized client to utilize this remote input port for sending FlowFiles. This would be authorization set on your "ProvenanceMonitoring" remote input port. - Check both your nifi-app.log and nifi-user.log for full stack traces and error related to this process. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.
... View more
11-12-2018
06:08 PM
3 Kudos
NiFi Restricted components are those processors, controller services, or reporting tasks that have the ability to run user-defined code or access/alter localhost filesystem data. - The NiFi User guide explains this as follows: ----------------------------------------- Restricted components will be marked with a icon next to their name. These are components that can be used to execute arbitrary unsanitized code provided by the operator through the NiFi REST API/UI or can be used to obtain or alter data on the NiFi host system using the NiFi OS credentials. These components could be used by an otherwise authorized NiFi user to go beyond the intended use of the application, escalate privilege, or could expose data about the internals of the NiFi process or the host system. All of these capabilities should be considered privileged, and admins should be aware of these capabilities and explicitly enable them for a subset of trusted users. Before a user is allowed to create and modify restricted components they must be granted access. ------------------------------------------ Users can only be restricted from adding such components in NiFi if NiFi has to be secured. Users of an unsecured NiFi will always have access to all components. - Prior to HDF 3.2 or Apache NiFi 1.6, all restricted components were covered by a single authorization policy: Ranger Policy (Base policies): NiFi Policies (Hamburger menu) Ranger permissions description: /restricted-components Access restricted components Read/View - N/A Write/Modify - Gives granted users the ability to add components to the canvas that are tagged as “restricted” - It was decided that lumping all components into one policy was not ideal. So NIFI-4885 was created to address this so that users' access to restricted components would be based on the level of restricted access they are being granted. read-filesystem read-distributed-filesystem write-filesystem write-distributed-filesystem execute=code access-keytab export-nifi-details - In order to avoid backward compatibility issues when users upgrade to a HDF 3.2+ or Apache NiFi 1.6.0+, the “Access restricted components” base policy still exists and defaults to "regardless of restrictions". In the NiFi global “Access Policies” UI, this is the default policy and is depicted as follows: In Ranger, this is still associated with just the “/restricted-components” policy. The four new policies are depicted as follows in Ranger and NiFi UIs: - Ranger Policy (Base policies): NiFi Policies (Hamburger menu) Ranger permissions description: /restricted-components/read-filesystem Access restricted componentsSub policy:Requiring ‘read filesystem’ Read/View - N/A Write/Modify - Allows users to create/modify restricted components requiring read filesystem. /restricted-components/read-distributed-filesystem Access restricted componentsSub policy:Requiring ‘read distributed filesystem’ Read/View - N/A Write/Modify - Allows users to create/modify restricted components requiring read distributed filesystem. /restricted-components/write-filesystem Access restricted componentsSub policy:Requiring ‘write filesystem’ Read/View - N/A Write/Modify - Allows users to create/modify restricted components requiring write filesystem. /restricted-components/write-distributed-filesystem Access restricted componentsSub policy:Requiring ‘write distributed filesystem’ Read/View - N/A Write/Modify - Allows users to create/modify restricted components requiring write distributed filesystem. /restricted-components/execute-code Access restricted componentsSub policy:Requiring ‘execute code’ Read/View - N/A Write/Modify - Allows users to create/modify restricted components requiring read filesystem. /restricted-components/access-keytab Access restricted components Sub policy:Requiring ‘access keytab’ Read/View - N/A Write/Modify - Allows users to create/modify restricted components requiring read filesystem. /restricted-components/export-nifi-details Access restricted components Sub policy:Requiring ‘export nifi details’ Read/View - N/A Write/Modify - Allows users to create/modify restricted components requiring read filesystem. - Below is a list of restricted components for each of the above sub-policies (current as of CFM 2.1.1 and Apache NiFi 1.13): Read-filesystem: NiFi component: Component type: Access provisions: FetchFile Processor Provides operator the ability to read from any file that NiFi has access to. TailFile Processor Provides operator the ability to read from any file that NiFi has access to. GetFile Processor Provides operator the ability to read from any file that NiFi has access to. - Read-Distributed-Filesystem: (Added NiFi 1.13) NiFi component: Component type: Access provisions: FetchHDFS Processor Provides operator the ability to retrieve any file that NiFi has access to in HDFS or the local filesystem. FetchParquet Processor Provides operator the ability to retrieve any file that NiFi has access to in HDFS or the local filesystem. GetHDFS Processor Provides operator the ability to retrieve any file that NiFi has access to in HDFS or the local filesystem. GetHDFSSequenceFile Processor Provides operator the ability to retrieve any file that NiFi has access to in HDFS or the local filesystem. MoveHDFS Processor Provides operator the ability to retrieve any file that NiFi has access to in HDFS or the local filesystem. - Write-filesystem: NiFi component: Component type: Access provisions: FetchFile Processor Provides operator the ability to delete any file that NiFi has access to. GetFile Processor Provides operator the ability to delete any file that NiFi has access to. PutFile Processor Provides operator the ability to write to any file that NiFi has access to. - Write-Distributed-Filesystem: (Added NiFi 1.13) NiFi component: Component type: Access provisions: DeleteHDFS Processor Provides operator the ability to delete any file that NiFi has access to in HDFS or the local filesystem. GetHDFS Processor Provides operator the ability to delete any file that NiFi has access to in HDFS or the local filesystem. GetHDFSSequenceFile Processor Provides operator the ability to delete any file that NiFi has access to in HDFS or the local filesystem. MoveHDFS Processor Provides operator the ability to delete any file that NiFi has access to in HDFS or the local filesystem. PutHDFS Processor Provides operator the ability to delete any file that NiFi has access to in HDFS or the local filesystem. PutParquet Processor Provides operator the ability to write any file that NiFi has access to in HDFS or the local filesystem. - Execute-code: NiFi component: Component type: Access provisions: ScriptedReportingTask Reporting Task Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. ScriptedLookupService Controller Service Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. ScriptedReader Controller Service Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. ScriptedRecordSetWriter Controller Service Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. ExecuteFlumeSink Processor Provides operator the ability to execute arbitrary Flume configurations assuming all permissions that NiFi has. ExecuteFlumeSource Processor Provides operator the ability to execute arbitrary Flume configurations assuming all permissions that NiFi has. ExecuteGroovyScript Processor Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. ExecuteProcess Processor Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. ExecuteScript Processor Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. ExecuteStreamCommand Processor Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. invokeScriptedProcessor Processor Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has. - access-keytab: NiFi component: Component type: Access provisions: KeytabCredentialsService Controller Service Allows user to define a Keytab and principal that can then be used by other components. - Export-nifi-details: NiFi component: Component type: Access provisions: SiteToSiteBulletinReportingTask Reporting Task Provides operator the ability to send sensitive details contained in bulletin events to any external system. SiteToSiteProvenanceReportingTask Reporting Task Provides operator the ability to send sensitive details contained in Provenance events to any external system. - ***Note: Some components may be found under multiple sub-policies above. In order for a user to utilize that component, they must be granted access to every sub policy required by that component. - Exceptions in HDF 3.2 and Apache 1.7 and 1.8: In order to use the following components, users must have full access to all restricted components policies: NiFi component: Component type: Access provisions: PutORC Processor This component requires access to restricted components regardless of restriction. Apache Jira: NIFI-5815 - A full breakdown of all other NiFi Policies can be found here: NiFi Ranger based policy descriptions - Cloudera Community
... View more
Labels:
10-31-2018
01:07 PM
@sri chaturvedi That specific property has to do with how long NIFi will potentially hold on to content claims which have been successfully moved to archive. It works in conjunction with the "nifi.content.repository.archive.max.usage.percentage=50%" property. - Neither of these properties will result in the clean-up/removal of any content claims that have not been moved to archive. If the disk usage of the content repository disk has exceeded 50% the "archive" sub-directories in the content repo should all be empty. - This does not mean that the active content claims will not continue to increase in number until content repo disk is 10% full. - Nothing about how the content repository works will have any impact on NiFi UI performance. A full content repository will howvere impact the overall performance of your NiFi dataflows. - Hope this answers your question.
... View more
10-31-2018
12:55 PM
7 Kudos
NiFi's content repository will hold on to claims until there are no active FlowFiles still anywhere on the canvas referencing that claim. This can result in very old or very large claims being left in the content repo using up space. Something as small as a single 0 byte FlowFile that sitting in some queue may end up preventing a multi-gigabyte content claim from being removed. The intent of this article is help users understand how to find those active FlowFiles and clear them form your dataflow. - There is no simple UI feature or NiFi rest-api endpoint users can use to return information linking FlowFiles to exiting content claims; however, all is not lost. Users can add an additional indexed field to your provenance configuration to so that is starts indexing the "ContentClaimIdentifier" associated with each FlowFile event generated. This would give you the ability to use Provenance to identify all the FlowFiles associated to specific claim in the content repo. - To add this, simply add "ContentClaimIdentifier" to the list if existing indexed fields via the nifi.properties file. Here is the specific property you will be editing: - nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename, ProcessorID, Relationship, ContentClaimIdentifier - NOTE: a restart of NiFi will need to occur before indexing of this new field will begin. NiFi will not go back and re-index existing events. It will only start adding this new indexed field to events created from this point forward. - You can then search your content repo for any content claim files that are of concern (for example searching for any very old or very large claims). Simply copy the claim number (filename of claim) and search for it via a provenance query. Once you have your list of FlowFile events all tied to same claim: - Above is example provenance query which returned below three FlowFiles: I would then need to look at lineage of each of those FlowFiles to see if any of them made it to a DROP event (A DROP event means it is no longer in your dataflows anywhere. Lineage can be displayed by clicking the "show lineage" icon to the right of any event from list . - Once you have found one or more with no DROP event, you will want to get details of last event by right clicking on it: From those details you can collect the Component ID of for the processor that produced that event. - - Go back to your main canvas and search on that component ID. Your FlowFile will be located in one of the outbound relationships to that component. - - As you can see my FlowFile was found to be in this "success" relationship for an UpdateAttribute processor. You will also notice that this connection contains many FlowFiles. If I only want to purge this one FlowFile I am going to need to insert a RouteOnAttribute processor to this flow that I configure to route only a FlowFile with a specific FlowFile UUID which I could also get from my provenance event details above. - The routeOnAttribute processor added property would as simple as: Property: purgeValue: ${uuid:equals('8297d10c-f6ca-4843-9593-320e5b265dd6')} - "purge" becomes the new relationship which you can then just auto-terminate. The "unmatched" relationship will get routed on in your flow and will contain every other FlowFile from this connection. - *** Please feel free to post any comments to this article of you have questions or see anything that needs to be added/clarified. - Thank you, Matt
... View more
Labels:
10-29-2018
06:36 PM
@Bobby Harsono - Some processor may be designed to utilize memory outside of the JVM. Some of the scripting processor like ExecuteProcess or ExecuteStreamCommand are a good examples. They are calling a process or script external to NiFi. Those externally executed commands will have a memory footprint of their own. - Listen type processors like ListenTCP or ListenUDP is another example. These have memory footprints both inside and outside the NiFi JVM heap space. These processors can be configured with socket buffer which is created outside of heap space.- - Thanks, Matt
... View more
10-17-2018
02:29 PM
@pavan srikar yes, that makes sense. When you start the DistirbutedMapCache server is starts a server on each NiFi node.The DIstributedMapCache Client should be configured to point at one specific node, so that every node pulls cache entries from same server. - A little back history: The DistibrutedMapCacheServer and DistirbutedMapCacheClient controller services date back to original NiFi releases versions. Back in those days there was no zero master clustering which we have now. There was a dedicated server that ran a NiFi Cluster Manager (NCM). At that time the DistributedMapCacheServer could only be setup on the NCM. - Once NiFi moved away from having a NCM, the functionality of these controller services was not changed to avoid breaking flows of user who moved to latest versions. The DistirbutedMapCacheServer does not offer HA (if node hosting server goes down, cache becomes unavailable). To provide HA here, new external HA caches options have been added as options. - thanks, Matt
... View more
10-17-2018
02:29 PM
yes, that makes sense. When you start the DistirbutedMapCache server is starts a server on each NiFi node. The DIstributedMapCache Client should be configured to point at one specific node, so that every node pulls cache entries from same server. - A little back history: The DistibrutedMapCacheServer and DistirbutedMapCacheClient controller services date back to original NiFi releases versions. Back in those days there was no zero master clustering which we have now. There was a dedicated server that ran a NiFi Cluster Manager (NCM). At that time the DistributedMapCacheServer could only be setup on the NCM. - Once NiFi moved away from having a NCM, the functionality of these controller services was not changed to avoid breaking flows of user who moved to latest versions. The DistirbutedMapCacheServer does not offer HA (if node hosting server goes down, cache becomes unavailable). To provide HA here, new external HA caches options have been added as options. - thanks, Matt
... View more
10-17-2018
02:28 PM
yes, that makes sense. When you start the DistirbutedMapCache server is starts a server on each NiFi node. The DIstributedMapCache Client should be configured to point at one specific node, so that every node pulls cache entries from same server. - A little back history: The DistibrutedMapCacheServer and DistirbutedMapCacheClient controller services date back to original NiFi releases versions. Back in those days there was no zero master clustering which we have now. There was a dedicated server that ran a NiFi Cluster Manager (NCM). At that time the DistributedMapCacheServer could only be setup on the NCM. - Once NiFi moved away from having a NCM, the functionality of these controller services was not changed to avoid breaking flows of user who moved to latest versions. The DistirbutedMapCacheServer does not offer HA (if node hosting server goes down, cache becomes unavailable). To provide HA here, new external HA caches options have been added as options. - thanks, Matt
... View more
10-17-2018
12:31 PM
@pavan srikar I should add that there is no processor that will specifically clone a FlowFile to every node in the NiFi cluster. - But there are other options if you do not want to standup an external map cache server. - Perhaps setting up a disk mount that is shared across all nodes. On Primary node only you run a flow that retrieves a new token every ~55 minutes writes it to this shared mounted directory set to overwrite previous written token each time. Then on all nodes you could create a flow that consumes this token without deleting it on schedule to perform your all node tasks. - Just a second option for you. - Thank you, Matt
... View more