Member since
07-30-2019
3387
Posts
1617
Kudos Received
999
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 133 | 11-05-2025 11:01 AM | |
| 378 | 10-20-2025 06:29 AM | |
| 518 | 10-10-2025 08:03 AM | |
| 358 | 10-08-2025 10:52 AM | |
| 394 | 10-08-2025 10:36 AM |
07-25-2017
02:03 PM
1 Kudo
@Sanaz Janbakhsh The policies you have identified above /flow (grants users the ability to view the UI), /proxy (Allows NiFi nodes and proxy servers to proxy requests for users to other NiFi nodes), default "all-nifi-resources" assigns "*" which grants user here access to every policy. The component level granular policies are based on the components assigned uuid. For connections, the policies are enforced based upon the processor component the connection originates from. for example: /remote-process-groups/<remote process group uuid>
/data/remote-process-groups/<remote process group uuid>
/process-groups/<process group uuid>
/data/process-groups/<process group uuid>
/processors/<processor uuid>
/data/processors/<processor uuid> There will be a unique policy for each and every component based on the specific components assigned uuid available. Component level authorizations are inherited from the parent process group when no specific processor, remote-process-group, or sub process group component level policy is set. So for a user to be able to view the FlowFiles in a connection (list queue), they must be granted "read" for the component (/data/processors/<processor uuid>) from which that connection originated. Access can be granted via inheritance from a parent process group instead by granting the user "read" to a parent process group (/data/process-groups/<process group uuid>) that contains the processor component. For a user to be able to empty a queue (empty queue), they must be granted "write" in the same manor as above for "read". If you user was added to the default "all-nifi-resources" in Ranger, then they already have read and write to all NiFi policies. Effectively they are a NiFi admin user. In addition to to user being granted the ability to "read" (list queue) and "write" (empty queue), the same must be granted for all node in your NiFi cluster. This is commonly done by adding a new policy in Ranger that uses the following NiFi resource Identifier: This policy would be assigned to all nodes and and include both "read" and "write" permissions. Why is this needed? When you login in to a NIFi cluster, you are logging in to only one node. When you make a request to list a queue, you expect to see results from all nodes in your cluster. So the node you are logged in to makes a request to all nodes to return there queue list. So the originating node must be granted the ability to view the other nodes data. The same holds true when you make a request to empty a queue while logged in to one node of a cluster. That node must be able to request that the other nodes empty their queue as well. Thank you, Matt
... View more
07-25-2017
01:04 PM
@Sanaz Janbakhsh This question revolves around setting the correct file based authorizer permissions for listing and emptying queues. Since you are using Ranger , I suggest starting a new question so as not add confusion as process is different. Thanks, Matt
... View more
07-14-2017
01:39 PM
1 Kudo
@Hadoop User The "it'll be helpful if what processor to be used in between listenSyslog and putHDFS is suggested" question is a hard one for anyone to answer without understanding the end result you are looking for. There are The following processors:
- parseSyslog (extract bits from syslog content in to FlowFile attributes) You can then use those attributes if you like to make routing decisions (routeOnAttribute), define unique target HDFS directories based on attribute value in PutHDFS
- SplitText or SlitContent (Can be used to FlowFiles that contain more then one syslog message each). You get improved performance if listenSyslog ingests in batches. - UpdateAttribute (Used to add you own custom attributes or manipulate existing attributes on FlowFiles) Thanks, Matt
... View more
07-14-2017
01:21 PM
@Hadoop User The processor components all have tags associated to them and the associated documentation for each processor component is also embedded in the application under "help" (found in upper right corner menu). If you drag the add "Processor" icon to your canvas you will be presented with aadd processor UI. In the upper right corner is a filter box. Typing "syslog" or "hdfs" will reduce the list to those processors that share those tags. Clicking on a processor will display a brief description of the processor in the same UI near the bottom. Details documentation can be found in help or by right clicking on a processor already added to the canvas and selecting"usage" form the context menu that appears. As far as what processor you want to use depends on your complete use case. First you need to determine ho you are going to ingest this syslog data (ListenSyslog) processor is an option. As far as writing to HDFS, the PutHDFS processor is the likely choice. There are many processors available for manipulating NiFi FlowFile content between ingestion and writing out the data to a destination. Thanks, Matt
... View more
07-13-2017
03:17 PM
@siva karna Glad to help. If you found I addressed the question, please mark answer as accepted to close this thread. Thanks, Matt
... View more
07-13-2017
02:34 PM
@siva karna Anytime you add, remove, or modify any nar in anyone of Nifi's lib directories, a restart will be needed. At startup NiFi extracts all those nars in to its work directory.
To understand if you will lose data, you need to look at method/processors being used it ingest data in your NiFi. While NiFi is carefully in handling data it already has in its possession, it has not control over data that is being sent to it. For example, any listen type processors will not be running so they will not be able to receive data while Nifi is restarting. Listen type processors that use TCP protocol should fail should trigger service unreachable/unavailable on sending side of connection. The sender should queue this data and continue to try and resend until service is available again. Now if you are using a listener that uses UDP protocol, that is a different story. There is no handshake there and you need to be willing to accept data loss by using that protocol for data transport. In order to truly answer that question, you need to closely look at how your dataflow is designed to ingest data. NiFi takes care of not losing data once it is in its control as FlowFiles. Thanks, Matt
... View more
07-13-2017
02:10 PM
NiFi stores FlowFile Attributes in the FlowFile repo and FlowFile Content in the Content repo. NiFi knows which queue FlowFiles were in when if it is shutdown. This allows Nifi to reload these FlowFiles back in to those queues and pick up where the dataflow left off after a restart.
... View more
07-13-2017
01:57 PM
2 Kudos
@siva karna I am not following the statement "so there is an abstraction for the first process group flow file it will stop so we will loss the data". Why would stopping a dataflow cause data loss? NiFi will only read in new nars/jars added to a NiFi lib directory on startup. There is no option to dynamically add classes during runtime. Thanks, Matt
... View more
07-13-2017
12:42 PM
1 Kudo
@Akash S The ListHDFS processor records state so that only new files are listed. The processor also has a configuration option for recursing subdirectories. You could set the directory to only /MajorData/Location/ and let it list all files from the subdirectories. As new subdirectories are created, the files within those new directories will get listed. If that does not work for you, the NiFi expression language (EL) statement that you are looking for would look something like this for the directory: /MajorData/Location/${now():format('yyyy/MM/dd')} The above would cause Nifi to only look in the target directory fro Files until the day changed. I am not sure the rate at which files are written in to these target directories, but be mindful that if a file is add between runs of the listHDFS processor and the day changes between those runs, that file will not get listed using the above EL statement. Thanks, Matt
... View more
07-12-2017
07:17 PM
2 Kudos
@M R I find the following very useful when trying to build Java regular expressions: http://myregexp.com The Java regular expression: ^(.*?)%%(.*?)%%(.*?)%%(.*?)%%(.*?),(.*?)%%(.*?)$ It has 7 capture groups that will result in: When you add a ew property to the extractText processor with a property name of "string" and use the above java regex. Of course if you are only looking for two capture groups, you could use the following regex instead: ^(.*?)%%.*?%%(.*?)%%.*?%%.*?,.*?%%.*?$ Thanks, Matt
... View more