Member since
07-30-2019
3391
Posts
1618
Kudos Received
1001
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 284 | 11-05-2025 11:01 AM | |
| 169 | 11-05-2025 08:01 AM | |
| 145 | 11-04-2025 10:16 AM | |
| 502 | 10-20-2025 06:29 AM | |
| 642 | 10-10-2025 08:03 AM |
07-26-2017
12:39 PM
@AnjiReddy Anumolu Just to add a little more detail to the above response from @zblanco. When NiFi ingest data, that data is turned in to NiFi FlowFiles. A NiFi FlowFile consists of Attributes (Metadata) about the actual data and the physical data. The FlowFile metadata is stored in the FlowFile repository as well as JVM heap memory for faster performance. The FlowFile Attributes includes things like filename, ingest time, lineage age, filesize, what connection the FlowFile currently resides in dataflow, any user defined metadata, or processor added metadata, etc....). The physical bytes that make up the actual data content is written to claims within the NiFi content repository. A claim can contain the bytes for 1 to many ingest data files. For more info on the content repository and how claims work, see the following link: https://community.hortonworks.com/articles/82308/understanding-how-nifis-content-repository-archivi.html Thanks, Matt
... View more
07-25-2017
04:48 PM
NiFi 1.2.0 added some alignment tools, but nothing is available NiFi 1.1.0. IN NiFi 1.20 you can select multiple components, right click to open context menu and select to align them vertically or horizontally.
----> These tools can't be used to align an entire canvas at once, but it gives you the ability to easily line up single rows or columns of processor components. In the future try to keep unrelated questions to different Hortonworks Community Connection (HCC) posts. Other HCC contributors are likely to miss questions asked within the context of an answer to another question. Thanks, Matt
... View more
07-25-2017
03:43 PM
@Anishkumar Valsalam If I am understanding correctly, you have a single NIFi installation that contains multiple different dataflows each managed/built by different users/teams. You don't want users from these different teams to be able to see the dataflows built by other teams. correct? NiFi granular access policies allow you to control what users can see and interface with. Any component for which a user has not been granted view the component or modify the component will appear only as a ghost component on the canvas. As you can see above there are several processors and process groups my currently authenticated user does not have access to view or modify. This user can not view the configuration, move the component, start it, stop it, view its data, delete data or even see the component name or type. If you are asking to even hide these ghosted components the canvas, that is not an option. There are important reason why these ghost components are visible to all users. 1. You may have user that work on multiple teams. If users cannot see ghosted components they are likely to build their dataflow on top of other dataflows on the canvas. This means user who can see multiple dataflows with be presented with a mess of a canvas to work with as components would be stack upon each other. 2. Ultimately all the dataflows in a single NiFi share the same set of resources. While a user in team 1 may not be able to see the details of other teams dataflow, it is still important that users in team 1 can see when back pressure or very large queues exist in other teams groups since that can ultimately have an impact on their dataflows. Team 1 user will not be able to see the actual data just queue counts. 3. Sometimes their are multiple team dataflows that share the same source data. For example and ConsumeKafka processor that feeds "success" relationship multiple times int to different teams dataflows. If we hide components completely, how would we render these other success relationships? If we jus hide them as well, the source processor would appear as only having one outbound connection. This would make it impossible for team1 to troubleshoot id ConsumeKafka just stopped consuming because one of these non visible connections where applying back pressure. Thanks, Matt
... View more
07-25-2017
03:01 PM
9 Kudos
The intent of this article is to show how NiFi policies in Ranger map to what you would see when using NiFi's default file based authorizer via the NiFi UI. This article will cover what access each of the policies granted to the entities (user and server) that assigned to them. There are controller level policies and component level policies in NiFi. The controller level policies are not tied to any specific component uuid. In Ranger those policies will just show as /<some policy name> - These include the following: Ranger Policy (Base policies): NiFi Policies (Hamburger menu) Ranger permissions description: /resources *** Note: No policies will be available until this policy is manually added. N/A This policy allows Ranger to retrieve a listing of all available policies from NiFi. The server/user from the keystore being used by Ranger must be granted “read” privileges to this resource. /flow * See note [3] below View the user interface Read/View - This policy gives users the ability to view the NiFi UI. All users must be granted “read” privileges to this policy or they will not be able to open the NiFi UI. If you are running a NiFi Cluster and/or accessing Your NiFi via a proxy, You need to grant all Nodes and any proxies read access to this policy as well. Write/Modify - N/A /system View system Diagnostics Read/View - Gives granted users access to the system diagnostics. In a NiFi cluster, nodes will need to access as well to display system diagnostic stats returned by other nodes. Write/Modify - N/A /controller Access the controller Read/View - Gives granted users and/or NiFi cluster nodes the ability to view:- Controller thread pool configuration- Cluster management page- Controller level Reporting tasks- Controller level Controller services Write/Modify - Gives granted users and/or NiFi cluster nodes the ability to create/modify:- Controller thread pool configuration- Cluster management page- Controller level Reporting tasks- Controller level Controller services /counters Access counters Read/View - Gives granted users ability to view counters Write/Modify - Gives granted users ability to modify counters /provenance Query provenance Read/View -Gives granted users ability to run provenance queries or access Provenance lineage graphs. Write/Modify - N/A /restricted-components * See note [1] below Access restricted components Read/View - N/A Write/Modify - Gives granted users ability to add components to the canvas that are tagged as “restricted” /proxy * See note [2] below Proxy user requests Read/View - Allows proxy servers to send request on behalf of other users. Write/Modify - Required /site-to-site Retrieve site-to-site details Read/View - Allows Other NiFi nodes to retrieve Site-To-Site details about this NiFi. /policies *** This policy has no purpose when using ranger and does not need to be used. Access all policies Read/View - Gives granted users the ability to view existing policies. Write/Modify - Gives granted users the ability to create new policies and modify existing policies. /tenants *** This policy has no purpose when using Ranger and does not need to be used. Access users/user groups Read/View - Gives granted users the ability to view currently authorized users and user groups. Write/Modify - Gives granted uses the ability to add, delete, and modify existing users and user groups. /parameter-contexts Access parameter contexts Read/View - Allows users to view and use ALL existing parameter contexts. Write/Modify - Allows users to create, modify, and delete ALL parameter contexts. /parameter-contexts/<uuid> Access Specific existing parameter context Read/View - Allows users to view and use a specific existing parameter context. Write/Modify - Allows users to modify or delete a specific parameter context. [1] new sub policies introduced for "/restricted-components" as of HDF 3.2 (Apache NiFi 1.12+). See following article for details: https://community.cloudera.com/t5/Community-Articles/NiFi-Restricted-Components-Policy-Descriptions/ta-p/249157 [2] All nodes in your NiFi cluster must be assigned to the "/proxy" policy. [3] All users must at a minimum be assigned to the "/flow" policy in order to view the NiFi UI. - The component level granular policies are based on the components assigned uuid. For connections, the policies are enforced based upon the processor component the connection originates from. - This includes the following policies: Ranger Component based policies: NiFi Component based policies: component Equivalent NiFi file based authorizer policy:Policy Ranger permissions description: /data-transfer/input-ports/<uuid> Each NiFi remote input port is assigned a unique <uuid> Receive data via site-to-site Both read and write is required and should be granted to the source NIFi servers sending data to this NiFi via this input port. /data-transfer/output-ports/<uuid> Each NiFi remote output port is assigned a unique <uuid> Send data via site-to-site Both read and write is required and should be granted to the source NIFi servers pulling data from this NiFi via this output port. /process-groups/<uuid> Each NiFi process group is assigned a unique <uuid> View the component Modify the component Read - (allows user to view process group details only) Write - (allows user to start, stop or delete process group. Users are able to added components inside process group and add controller services to process group) /data/process-groups/<uuid> Each NiFi process group is assigned a unique <uuid> View the data Modify the data Read - (allows user to view data was processed by components in this process group and list queues) Write - (allows users to empty queues/purge data from queues within process group) /policies/process-groups/<uuid> *** not needed when using Ranger Each NiFi process group is assigned a unique <uuid> View the policies Modify the policies Read - N/A in Ranger Write - N/A in Ranger /processors/<uuid> Each NiFi processor is assigned a unique <uuid> View the component Modify the component Read - (Allows user to view processor configuration only) Write - (Allows user to start, stop, configure and delete processor) /data/processors/<uuid> Each NiFi processor is assigned a unique <uuid> View the data Modify the data Read - (allows user to view data processed this processor and list queues on this processors outbound connections) Write - (allows users to empty queues/purge data from this processors outbound connections) /policies/processors/<uuid> *** Not needed when using Ranger Each NiFi processor is assigned a unique <uuid> View the policies Modify the policies Read - N/A in Ranger Write - N/A in Ranger /controller-services/<uuid> Each NiFi controller services is assigned a unique <uuid> View the component Modify the component Read - (Allows user to view controller service configuration only) Write - (Allows user to enable, disable, configure and delete controller services) /provenance-data/<component-type>/<component-UUID> Each NiFi component is assigned a unique <uuid> view provenance Read - Allows users to view provenance events generated by this component Write - N/A in Ranger /operation/<component-type>/<component-UUID> Each NiFi component is assigned a unique <uuid> operate the component Read - N/A in Ranger Write - Allows users to operate components by changing component run status (start/stop/enable/disable), remote port transmission status, or terminating processor threads There will be a unique policy for each and every component based on the specific components assigned uuid available. Component level authorizations are inherited from the parent process group when no specific processor or sub process group component level policy is set. Ranger supports the " * " wildcard when assigning policies. - In a NiFi cluster, all nodes must be granted the ability to view and modify component data in order for user to list or empty queues in processor component outbound connections. With Ranger this can be accomplished by using the a wildcard to grant all the NiFi nodes read and write to "/data/*" NiFi resource. *** Users should not be given global access to all data, but instead be restricted to specific process groups they have been granted access to. *** Also note at time of writing Ranger groups are not supported by NiFi for authorization. UPDATE: Ranger based group support was added as a new feature/capability in HDF 3.1.x
... View more
Labels:
07-25-2017
02:03 PM
1 Kudo
@Sanaz Janbakhsh The policies you have identified above /flow (grants users the ability to view the UI), /proxy (Allows NiFi nodes and proxy servers to proxy requests for users to other NiFi nodes), default "all-nifi-resources" assigns "*" which grants user here access to every policy. The component level granular policies are based on the components assigned uuid. For connections, the policies are enforced based upon the processor component the connection originates from. for example: /remote-process-groups/<remote process group uuid>
/data/remote-process-groups/<remote process group uuid>
/process-groups/<process group uuid>
/data/process-groups/<process group uuid>
/processors/<processor uuid>
/data/processors/<processor uuid> There will be a unique policy for each and every component based on the specific components assigned uuid available. Component level authorizations are inherited from the parent process group when no specific processor, remote-process-group, or sub process group component level policy is set. So for a user to be able to view the FlowFiles in a connection (list queue), they must be granted "read" for the component (/data/processors/<processor uuid>) from which that connection originated. Access can be granted via inheritance from a parent process group instead by granting the user "read" to a parent process group (/data/process-groups/<process group uuid>) that contains the processor component. For a user to be able to empty a queue (empty queue), they must be granted "write" in the same manor as above for "read". If you user was added to the default "all-nifi-resources" in Ranger, then they already have read and write to all NiFi policies. Effectively they are a NiFi admin user. In addition to to user being granted the ability to "read" (list queue) and "write" (empty queue), the same must be granted for all node in your NiFi cluster. This is commonly done by adding a new policy in Ranger that uses the following NiFi resource Identifier: This policy would be assigned to all nodes and and include both "read" and "write" permissions. Why is this needed? When you login in to a NIFi cluster, you are logging in to only one node. When you make a request to list a queue, you expect to see results from all nodes in your cluster. So the node you are logged in to makes a request to all nodes to return there queue list. So the originating node must be granted the ability to view the other nodes data. The same holds true when you make a request to empty a queue while logged in to one node of a cluster. That node must be able to request that the other nodes empty their queue as well. Thank you, Matt
... View more
07-25-2017
01:04 PM
@Sanaz Janbakhsh This question revolves around setting the correct file based authorizer permissions for listing and emptying queues. Since you are using Ranger , I suggest starting a new question so as not add confusion as process is different. Thanks, Matt
... View more
07-14-2017
01:39 PM
1 Kudo
@Hadoop User The "it'll be helpful if what processor to be used in between listenSyslog and putHDFS is suggested" question is a hard one for anyone to answer without understanding the end result you are looking for. There are The following processors:
- parseSyslog (extract bits from syslog content in to FlowFile attributes) You can then use those attributes if you like to make routing decisions (routeOnAttribute), define unique target HDFS directories based on attribute value in PutHDFS
- SplitText or SlitContent (Can be used to FlowFiles that contain more then one syslog message each). You get improved performance if listenSyslog ingests in batches. - UpdateAttribute (Used to add you own custom attributes or manipulate existing attributes on FlowFiles) Thanks, Matt
... View more
07-14-2017
01:21 PM
@Hadoop User The processor components all have tags associated to them and the associated documentation for each processor component is also embedded in the application under "help" (found in upper right corner menu). If you drag the add "Processor" icon to your canvas you will be presented with aadd processor UI. In the upper right corner is a filter box. Typing "syslog" or "hdfs" will reduce the list to those processors that share those tags. Clicking on a processor will display a brief description of the processor in the same UI near the bottom. Details documentation can be found in help or by right clicking on a processor already added to the canvas and selecting"usage" form the context menu that appears. As far as what processor you want to use depends on your complete use case. First you need to determine ho you are going to ingest this syslog data (ListenSyslog) processor is an option. As far as writing to HDFS, the PutHDFS processor is the likely choice. There are many processors available for manipulating NiFi FlowFile content between ingestion and writing out the data to a destination. Thanks, Matt
... View more
07-13-2017
03:17 PM
@siva karna Glad to help. If you found I addressed the question, please mark answer as accepted to close this thread. Thanks, Matt
... View more
07-13-2017
02:34 PM
@siva karna Anytime you add, remove, or modify any nar in anyone of Nifi's lib directories, a restart will be needed. At startup NiFi extracts all those nars in to its work directory.
To understand if you will lose data, you need to look at method/processors being used it ingest data in your NiFi. While NiFi is carefully in handling data it already has in its possession, it has not control over data that is being sent to it. For example, any listen type processors will not be running so they will not be able to receive data while Nifi is restarting. Listen type processors that use TCP protocol should fail should trigger service unreachable/unavailable on sending side of connection. The sender should queue this data and continue to try and resend until service is available again. Now if you are using a listener that uses UDP protocol, that is a different story. There is no handshake there and you need to be willing to accept data loss by using that protocol for data transport. In order to truly answer that question, you need to closely look at how your dataflow is designed to ingest data. NiFi takes care of not losing data once it is in its control as FlowFiles. Thanks, Matt
... View more