Member since
07-30-2019
3471
Posts
1642
Kudos Received
1020
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 173 | 06-03-2026 06:06 PM | |
| 466 | 05-06-2026 09:16 AM | |
| 860 | 05-04-2026 05:20 AM | |
| 508 | 05-01-2026 10:15 AM | |
| 635 | 03-23-2026 05:44 AM |
10-08-2021
11:06 AM
@Ankit13 How do you know no more files we will be put after the NiFi flow processing starts? To me in sound like the PutFile should execute at default 0 secs (as fast at it can run) and you should instead control this dataflow at the beginning were you consume the data. For example: In a 24 hour window data is being written to source directory to be consumed from between 00:00:00 and 16:00:00. Then you want to write that data to target directory starting at 17:00. So you instead setup a cron on a listFile processor to consume list the files at 17:00 and 17:01 and then have a FetchFile and PutFile running all the time so these immediately consume all the content for the listed files and write them to target directory. Then your listFile does not execute again until same time next day or whatever you cron is. This way the files are all listed at same time and the putFile can execute for as long as needed to write all those files to the target directory. Hope this helps, Matt
... View more
10-07-2021
02:40 PM
@CodeLa @SAMSAL I want to point out that tracking timestamps will not always guarantee NiFi will consume all files from the input file directory depending on how they are being placed in that directory. The ListFile processor looks at the last modified timestamp on the file. It then lists all files since the last recorded timestamp stored in NiFi state manager from the previous processor execution. On first run their will be no state and this everything currently is listed. Now consider the scenarios below which can affect above from listing all files: The mechanism that is writing the files to that inout directory is not updating the last modified timestamp on the file once it is done writing to it. Let say we have file 1 that starts being written to as 12:00:01.000 and file 2 that starts being written as 12:00:01.300. File 2 completes first and is consumed by listFile and stored state is updated to reflect 12:00:01.300. Now File 1 completes, but is never consumed by ListFile since its last modified timestamp is older than file 2. If you are in such a scenario, the ListFile offers a different "Listing Strategy" called "Tracking Entities" which tracks filenames as well in a cache service which allows it to still list files that may have an older timestamp. Another thing to consider is listFile may list the same file more than once. Consider this scenario: You tell NiFi ListFile to list files from directory /nifi/myfiles/. The mechanism writing these files to the target directory does update the last modified timestamp as file is being written, but does not use a ".<filename>" (dot rename) approach to writing these files (means file is initially a hidden file until file write completes and then is renamed and made unhidden. Default listFile config ignores hidden files). So when ListFile runs, it sees that file with newer last modified timestamp and lists it. Then on next execution it sees same file again because its last modified timestamp is updated as file is still being written to. If you are in such a scenario, you would want to make use of the "Minimum File Age" property. This property tells the listFile to ignore any files were the last modified time stamp when compared to current time is not at least that configured amount of time old (that means last modified timestamp has not changed for configured amount of time). That configured time is arbitrary and what ever length is needed for you to be confident file write was complete. Something else you need to consider depends on if both the following are true: 1. You are using a multi node NiFi cluster 2. The configured directory you are listing from is mounted to every node. Since every node in a NiFi cluster is executing the same dataflow, you want to avoid every node from listing the same files. IN this scenario you would change the "Execution" configuration from "All nodes" to "Primary" on the ListFile and change "input Directory location" from "local" to "remote". Then you will want to set "load balance Strategy" to "Round Robin" on the connection between ListFile and FetchFile. NOTE: Never set the Execution on any processor that has an inbound connection to "Primary node". ONLY processor with not inbound connection should be considered for this execution configuration. I know this is a lot to digest, but very important to be aware of to ensure success. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-07-2021
02:06 PM
@Ronman I don't know anything about the cetic/helm-nifi image. I assume you are talking about this: https://github.com/cetic/helm-nifi I started parsing through what is in the above github and the authorizers.xml that is built looks poorly done. Can you share what is in your authorizers.xml file on your NiFi host? The image looks to create numerous providers that are not actually used from what I can tell. Looks like it creates: 1. file-user-group-provider 2. ldap-user-group-provider (only if ldap enabled) 3. composite-configurable-user-group-provider (Only if ldap enabled) 4. file-access-policy-provider. (Always points at file-user-user-group-provider which means 2 and 3 would never get used even if they were created) 5. managed-authorizer (points at file-access-policy-provider) 6. file-provider (only if LDAP enabled. This is a legacy provider and not sure why anyone would still use it. It can reference and of the above user-group providers) So seeing what is actually written to that file might be helpful here. Also on startup the Authorizers.xml is responsible for seeding some initial polciy for the admin user in the users.xml and authorizations.xml files. This would including the intial set of policies for the root PG. This will not happen if upon first launch of NiFi there was not flow.xml.gz yet and thus no flow.xml.gz containing the root PG UUID yet. So you may want to rename your existing authorizations.xml file and restart your NiFi so that a new one is generated since you have a flow.xml.gz now and see if that gives you the policies you need to start editing the canvas. But even if above works, it still think you have an issue within your authorizers.xml files configuration. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-07-2021
05:54 AM
@Ronman Please share the version of Apache NiFi you have installed. Thanks, Matt
... View more
10-07-2021
05:47 AM
@Ankit13 Perhaps I don't understand your use case. Are you saying you have a NiFi dataflow that slowly ingested Data producing FlowFiles that work their way through your dataflow to this putFile processor? Then you want these 1000s of FlowFiles to queue up so that they can all be put to the local file system directory at the same time? So what is being suggested by @m_adeel is to use the NIPYAPI to automated the starting and stopping of the putFile processor at a given time. You could also do the same through NiFi REST_API calls. You would still have the challenge of when to stop it. Does the source of data ever stop coming in? Would you be able to put all the FlowFiles from the inbound connection queue to disk before more source FlowFiles started flowing in to the queue? Why the need to do this at a specific data and time? Thanks, Matt
... View more
10-06-2021
10:48 AM
@Ronman It is interesting that you are seeing a 403 in response to trying to add a new policy on the root Process Group (PG). By granting your user string (CN=admin, OU=NIFI) to the following global policies: you should have ability to set any additional policies anywhere. From the root canvas you can either right click and select "manage access policies" or click on the small key icon in the "Operate" panel that shows the PG name and PG assigned uuid to access the PG level access policies. First you would select the policy you want to add user or group strings to from the pull-down menu. If the policy has not yet had any users assigned to it, you'll need to click on "add policy" before being able to click on the Then click on the small "add users/groups to this policy" icon to the far right. Note: Child PGs added to the canvas will by default inherit all their policies from the parent PG. You will be given option to "override" this and set child specific users/groups on that child PG. You are stating that you encounter a 403 when you click on the "add" button? What do you see in the nifi-user.log when you perform that action? If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-06-2021
06:00 AM
@CodeLa Everything you do via the NiFi canvas when building out your dataflows is preserved in the flow.xml.gz file written to disk. On NiFi service start, the flow.xml.gz is loaded in to memory and and then FlowFiles from the the flowfile_repository are loaded back in to the dataflow connections they were last in prior to NiFi being stopped. In order for your your canvas to be blank on NiFi service startup, that flow.xml.gz is missing. By default NiFi creates an archived copy of the flow.xml.gz each time a change is made. The following properties from the nifi.properties file control where the flow.xml.gz is written and if/how the archiving is setup: Recovering the flow from archive is as simple as copying the latest "<timesatmp>_flow.xml.gz" from the configured archive directory to the configured NiFi configuration file location and renaming it to "flow.xml.gz". Make sure proper ownership and permissions are set after you copy and rename (Must be owned and accessible by the NiFi service user). As a future suggestion: Another way to protect against a total loss of your flow is running a NiFi cluster instead of a single NiFi instance. In a NiFi cluster, every NiFi node preserves its own identical copy of the flow.xml.gz file. So should catastrophe strike on one of the nodes, you can simply copy the flow.xml.gz from any one of the other nodes to recover the bad node. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-06-2021
05:46 AM
@Ankit13 NiFi processors support "Timer Driven" and "Cron Driven" Scheduling Strategies. Timer driven strategy allows you to specify a time interval for scheduling (for example: "30 secs" which means processor will execute every 30 seconds.). Cron Driven strategy supports a Quartz Cron [1] being used to specify when the processor should execute. There is a third option on some processors which is Event Driven that should not be used. It was created long ago and considered experimental. It is has since been deprecated due to improvement made in the Timer Driven strategy. It only remains in NiFi to avoid breaking flows of those who use it when they upgrade. Important things to understand about your ask: Let's assume you configure your PutFile to execute using the Cron Driven scheduling strategy and the inbound connection to the putFile processor has multiple FlowFiles queued. When the processor executes it will process only 1 of those FlowFiles from that inbound connection queue with default settings. The next queued FlowFile would not get processed until the next scheduled cron execution. While there is no way to make sure that every queued FlowFile is processed in in a single cron execution you can change the configured Run Duration: The Run Duration tells the processor to continue to use the same execution thread to execute against as many queued FlowFile as possible within the configured run duration time. Let say it takes more than 2 secs to write the very first FlowFile to the target directory. In that case, only one FlowFile would be processed. So there would be no perceived difference between a run duration of 0ms and 2s. In a NiFi cluster, each node in the cluster executes the dataflow against the FlowFiles queued on that same node. So if FlowFiles were queued on the inbound connection to PutFile on all nodes, each node would execute 1 at each cron interval processing through FlowFile(s) per node as described above. [1] https://community.cloudera.com/t5/forums/replypage/board-id/Questions/message-id/229905 If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-05-2021
01:37 PM
@_fe_20 How exciting that you are diving in to NiFi. One things that those new to NiFi need to understand is that the dataflows execute independent of the authenticated user who built the dataflow. This means that every component (processor, controller service, reporting task, RPG, etc...) added by whatever user is actually being executed by the NiFi service user and not the user who happens to be logged in to the NiFi UI. So lets say your NiFi process is owned by a local "nifi" user. This means the ListSFTP and FetchSFTP processors are executed as the nifi user even though you have configured a different user in the processor's configuration. So just like from a terminal windows this would look like: "sftp -oIdentityFile=/path/to/private/keyfile <username>@<sftpserver>" is being executed by the nifi service user. So when it comes to the private key configured in the processor, it must be owned by the nifi service user. Now on to issue two. You are using a Putty Private Key (PPK). NiFi does not use putty, so you would need to extract your private key from the ppk file. for example: puttygen <yourppk>.ppk -O private-openssh -o <your>.pem Place this private pem key in a directory owned and accessible by the NiFi service user. Make sure permission are not open to group or other since this is a private key. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-05-2021
01:07 PM
Hello @edoS Welcome to the community! NiFi's provides so many option for user authentication and authorization, setting up exactly what you need can be overwhelming at times. This is certainly something the Cloudera support could walk you through if you have a support contract with us that covers the NiFi service. At a high level, here is what you need to understand about the authentication and authorization process in NiFi. Authentication happens first and must be successful before any authorization is verified. NiFi supports numerous ways to authenticate users/clients (TLS, Kerberos, LDAP, openID, etc...). No matter which method is used, the end result of any authentication is a user string that identifies the successfully authenticated user/client. That user string is then evaluated against the identity mappings [1] you may have configured in the nifi.properties file. These identity mappings are used to normalize the user strings. for example: Trim the CN from the full DN in a user/client certificate Trim the user name from a kerberos principal convert the user string to all uppercase or lowercase The resulting user/client string is then passed to the authorizer to verify that user/client is authorized for the NiFi Resource Identifier being requested. NiFi authorizers.xml is where this configuration is setup. This file is easiest to read from the bottom up. At the bottom of the authorizers.xml you will find your authorizer which you have setup as the "Ranger-Provider". It is important to understand how this authorizer works. NiFi runs a background thread that checks in with Ranger to see if there is a new policy definition for the NiFi service. If so, the new definition is downloaded by NiFi. What Ranger provides to NiFi in this downloaded policy definition are all the polices setup in Ranger. For each there will be the "NiFi Resource Identifier(s)" along with the user strings and group strings that have been assigned "Read" and/or "Write" permissions. Now remember up to this point all NiFi knows about the authenticated user is the user string. NiFi has no idea yet what groups that user string may belong to. Within the Ranger-Provider, you will find a property name with "User Group Provider". The value set here tell the authorizer where to check to see if the user string passed from authentication has any known user to group associations. Search your authorizers.xml for configured User Group Provider [2]. There are numerous options that can be configured for determining user to group associations. Some of the available providers allow you to configured multiple providers. While the authorizer "ranger-provider" can only point at 1, it may point at a "composite-configurable-user-group-provider" [3] for example that can be setup to reference multiple user-group-providers. The key here is making sure you have added 1 or more user group providers that will return all the user to group associations you need. Based on the log output you shared from the nifi-user.log. We know that none of the user group providers you may have setup returned any group strings associated to your user string (identity[18330301],groups[] ). This is why "groups [ ]" is empty. The "file-user-group-provider" [4] allows you to create user string to group string associations manually via the NiFi UI directly. The commonly used "ldap-user-group-provider" [5] determines user and group associations via user and/or group syncs with ldap/AD. Now that NiFi knows what groups the authenticated user string is associated with, the user and the groups can be checked against the downloaded policies to see if the user is authorized for the action being performed or the end-point trying to be accessed. [1] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#identity-mapping-properties [2] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#authorizers-setup [3] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#composite-implementations [4] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#fileusergroupprovider [5] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#ldapusergroupprovider If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more