About MattWho

MattWho · ‎03-24-2021

@dzbeda Have you tried specifying the index in your configured Query within the GetSplunk NiFi processor component? NiFi is not going to be able to provide you a list of indexes from your Splunk to choose from, you would need to know what indexes exist in your Splunk server. Hope this helps, Matt

MattWho · ‎03-22-2021

@Garyy That depends on what your definition is of "Standalone" NiFi server. A one node NiFi cluster and a standalone NiFi are two different things and I have seen many users with a one Node NiFi cluster refer to that as standalone. A true standalone NIFi does not use Zookeeper and does not need to replicate requests to other nodes in the NiFi cluster, so none of the "cluster" configuration properties are used. Additionally, a standalone NiFi has no dependency on zookeeper at the core level. NiFi clusters use ZK for Cluster Coordinator and Primary node election (these roles only exist in a cluster setup) and Cluster wide state storage (Standalone has no need to share state with other nodes, so all state is simply stored locally). You can tell if your NiFi is a true standalone by checking the following property in the nifi.properties file: nifi.cluster.is.node If it is set to true, then this NiFi is configured to operate as a cluster even if their is only one node If it is set to false, then node is truly standalone. Hope this helps, Matt

MattWho · ‎03-17-2021

@sambeth NiFi authorization lookups require an exact case sensitive match between the resulting authentication user string or associated group string (NIFi user group providers configured in the authorizers.xml are responsible for determining associations between user strings and group strings within NiFi) and the user/group strings which the the authorized policies are assigned to. So if the User identity string that results after Authentication process is: "CN=John, OU=Doe", then that exact case sensitive string must be what the policies are authorized against. NiFi does provide the ability to use Java regular expressions post authentication to manipulate the authentication string before it is passed on for authorization. These Identity mapping pattern, value, and transform properties can be added to the nifi.properties file. https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#identity-mapping-properties For example: nifi.security.identity.mapping.pattern.dn=^CN=(.*?), OU=(.*?)$ nifi.security.identity.mapping.value.dn=$1 nifi.security.identity.mapping.transform=LOWER Now if Authentication resulted in string "CN=John, OU=Doe", the above Regex would match and the resulting user client string would be "john" (capture group 1 is value used and transformed to all lowercase) You can create as many of these mapping pattern sets of properties as you like as long as each property name is unique in its last field... nifi.security.identity.mapping.pattern.dn2= nifi.security.identity.mapping.pattern.kerb= nifi.security.identity.mapping.pattern.kerb2= nifi.security.identity.mapping.pattern.username= etc.... IMPORTANT note: These "patterns" are evaluate against every authenticated string (this includes Mutual TLS authentication such as those between NIFi nodes using the NiFi keystore) in alpha-numeric order. The first java regular expression to match will have its value applied and transformed. So making sure you properties are build in order of most complex regex to most generic regex is very important. Hope this helps you, Matt

MattWho · ‎03-15-2021

@sambeth The hash (#) character is reserved as a delimiter to separate the URI of an object from a fragment identifier. Registry has a number of different fragment Identifiers. The fragment identifier represents a part of, fragment of, or a sub-function within, an object. The fragment identifier follows the "/#/" in the URL and can represent fragments in text documents by line and character range, or in graphics by coordinates, or in structured documents using ladders. For example the "grid-list" of flows displayed when you access the NiFi UI. No, you cannot remove the # from the URL. Are you encountering an issue? Hope this helps, Matt

MattWho · ‎03-15-2021

@sambeth Your NiFi instance is proxying the request for your NiFi user to NiFi-Registry when you try to start version control, change version, etc on a process group on the NiFi canvas. All your NiFi hosts must be authorized for the following in NiFi-Registry: You can authorize these Special Privileges from teh NiFi-Registry UI by clicking on the wrench icon in upper right corner and clicking the pencil (edit) icon to the right of your NiFi host in the list of users. This allows your NiFi hosts to read all the buckets, but your NiFi user who the request is being proxied for must still be authorized for whichever specific buckets you want that user to have access to. Typically an admin user will be responsible for creating buckets in NiFi-Registry and the authorizing specific user access to those buckets. Again through the wrench icon, the admin would create that new bucket and then click on the pencil (edit) icon which shows the following: Above is example for a bucket "user-bucket" I created using the NiFi-Registry Admin user. I then authorized my users group and nifisme1 user ability to read (can import flows from bucket and see flows in bucket), write (can version control new PG to this bucket and commit new version of existing version controlled PG), and delete (can deleted version controlled flows from bucket via NiFi-Registry UI). If you found this solution helped resolve your issue, please take a moment to login and accept it. Hope this helps, Matt

MattWho · ‎03-15-2021

@alexwillmer NiFi does not support using wildcards in all scenarios. Access decisions would include authorization against specific endpoints. Not access decisions that may not work with wildcards may include some buttons remaining greyed out. So if you encounter a NiFi Resource Identifier is not giving you the expected result with a wildcard, try setting the policy explicitly and see if desired outcome is observed. The following article provides insight in to the expected access provided by each NiFi Resource Identifier: https://community.cloudera.com/t5/Community-Articles/NiFi-Ranger-based-policy-descriptions/ta-p/246586 NiFi actually downloads the policy definitions from Ranger and all authorizations are done based on the last downloaded set of policies (NiFi runs a background thread to check for updated policy definitions from Ranger). NiFi does not send a request to verify authorization to Ranger itself. Hope this helps, Matt

MattWho · ‎03-09-2021

@nishantgupta101 There is no reason you could not write your own custom script that connects to a FTPS endpoint to retrieve a file which can be called via the ExecuteStreamCommand processor. There are also other script based processors that you can use.

MattWho · ‎03-08-2021

@pacman In addition to what @ckumar already shared: NiFI purposely leaves components visible to all on the canvas. but unless authorized to view those components, they will display as "ghost" implementations. "Ghosted" components will not show any component names or classes on them. They will only show stats. Unauthorized users will be unable to view or modify the configuration. User will also be unable to list or view data in connections (only see numbers of FlowFiles queued on a connection). The reason NiFi shows these ghosted components is to prevent multiple users from building their dataflows on top of one another. It is very common for users to have multiple teams building their own dataflows, but then also have monitoring teams that may be authorized as "operators" across all dataflows. Or they may have some users that are members of multiple teams. That means these users who can see more, would be left with potentially components layered on top of one another making management very difficult. The stats are there so even if a user can not view or modify a component, they can see where FlowFile backlogs are happening. Since NiFi operates within a single JVM and every dataflow, no matter which user/team built them, is executed as the NiFi service user, everything must share the same system resources (various repos, Heap memory, disk I/O, cpu, etc). These stats provide useful information that one team can use to communicate to another team should resource utilization become an issue. NiFi's authorization model allows users to make very granular access decisions for every component. Authorizations are inherited from the parent process group unless more granular policies are setup on a child component (processor, controller service, input/output port, sub-process group, etc..). Hope this helps, Matt

MattWho · ‎03-03-2021

@Pavitran This does present a challenge. Typically the ListFile is used to list files from a local file system. That processor component is designed record state (default based on last modified timestamp) so that only newer files are consumed. But the first run would result in listing all files by default. Also looking at your example, your latest directory does not correspond to current day. The listFile (does not actually consume to content) generates a 0 byte FlowFile for each file listed along with some attributes/metadata about the source file. The FetchFile processor would then be used to fetch the actual content, this allows with large listing to redistribute these 0 byte FlowFiles across all nodes in your cluster before consuming the content (provided same local file system is mounted across all nodes. If different files per node, do not load balance between processors). So you could make a first run which lists everything and just delete those 0 byte files. That would establish state. Then from that point on the ListFile would only list the newest files created. Pros: 1. State allows this processor to be unaffected by outages, the processor will still consume latest all non previously listed files after an outage. Cons: 1. You have this initial run which would create potentially a lot of 0 byte FlowFiles to get rid of in order to establish state. 2. With an extended outage, on restart of the flow it may consume more than just the latest since it will be consuming all files with newer timestamps than timestamp last stored in state. Other options: A: The ListFile processor has an optional property that sets the "Maximum File Age" which limits the listing of Files to those not olde then x set amount of time. Pros to. setting this property: 1. Reduces or eliminates the massive listing on first run Cons to setting this property: 2. Under an extended outage, where outage exceeds configured "Maximum File Age", a file you wanted listed may be skipped. B: Since the FetchFile uses attributes/metadata from to fetch the actual content, you could craft a source FlowFile on your own and send it to the FetchFile processor. For example, use a ExecuteStreamCommand processor execute a bash file on disk to get the list of Files only from the latest directory. Then use UpdateAttribute to add the other required attributes needed by FetchFile to get the actual content. Then use SplitFile to split that listing of files in to individual FlowFiles before the FetchFile processor. Pros: 1. You are in control of what is being listed. Cons: 1. Depending on how often a new directory is created and how often you run your ExecuteStreamCommand processor, you may end up listing the same source files over again since you will not have a state option with ExecuteStreamCommand. But you may be able to handle this via detectDuplicate processor in your flow design. 2. If the listed Directory has some new file added to it post previous listing by ExecuteStreamCommand, next run will list all previous files again along with new ones for same directory. Again, might be able to handle this with detectDuplicate processor. Hope this helps give you some ideas, Matt

MattWho · ‎03-01-2021

@IAMSID I think you are asking two different questions here. In order for the community to help, it would be useful if you gave fo detail around each of your issues. Your example is not clear to me. NOt knowing anything about your source data, what characters are you not expecting? Providing the following always helps: 1. A dataflow template showing what you have done. 2. Sample input file 3. Desired output file base on above sample For query 2, 1. How is data being ingested in to NiFi? 2. Configuration of processor components used to ingest data (ConsumeKafka<version>, ConsumeKafkaRecord<version>, recordWriter, etc...)? 3. What other processors does the FlowFile pass through in this dataflow (flow template)? Thanks, Matt

Online	Offline
Last Visited	‎12-26-2025 01:55 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎12-26-2025 01:55 PM
Posts	3,406
Kudos received	1618

Cloudera Community

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Invoking Nifi rest api in Data Flow

Re: NIFI Get Splunk

Re: NiFi web UI timeouts

Re: SSLHandshakeException Nifi and Nifi Registry

Re: why is there a # in nifi-registry UI url

Re: SSLHandshakeException Nifi and Nifi Registry

Re: What does the Nifi log entry "Skipping policy ...

Re: How to send file through FTP in Nifi using SSL...

Re: How to achieve true multi tenancy in nifi?

Re: How to get files from latest directory based o...

Re: Replace text processor: search and replace mul...