Member since
07-30-2019
3406
Posts
1622
Kudos Received
1008
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 311 | 12-17-2025 05:55 AM | |
| 372 | 12-15-2025 01:29 PM | |
| 349 | 12-15-2025 06:50 AM | |
| 339 | 12-05-2025 08:25 AM | |
| 588 | 12-03-2025 10:21 AM |
10-07-2021
05:47 AM
@Ankit13 Perhaps I don't understand your use case. Are you saying you have a NiFi dataflow that slowly ingested Data producing FlowFiles that work their way through your dataflow to this putFile processor? Then you want these 1000s of FlowFiles to queue up so that they can all be put to the local file system directory at the same time? So what is being suggested by @m_adeel is to use the NIPYAPI to automated the starting and stopping of the putFile processor at a given time. You could also do the same through NiFi REST_API calls. You would still have the challenge of when to stop it. Does the source of data ever stop coming in? Would you be able to put all the FlowFiles from the inbound connection queue to disk before more source FlowFiles started flowing in to the queue? Why the need to do this at a specific data and time? Thanks, Matt
... View more
10-06-2021
11:34 AM
@sundaram_idfc Additional information may help here: What versions of NiFi are bing used by each NiFi instance you have installed? What Java version is being used by each of these NiFi instances? What is your configured "Max Timer Driven thread count" setting under global menu --> controller settings? How many cores does your server have that is running NiFi? Any chance you are occasionally sending a larger FlowFile? When in this state, did you collect a series of NiFi thread dumps to see if the thread is progressing or what it is waiting or blocked by (nifi.sh dump <dump filename>) Any WARN or ERROR related log entries in the nifi-app.log related to this port, OutOfMemory, or file handles? Thanks, Matt
... View more
10-06-2021
10:48 AM
@Ronman It is interesting that you are seeing a 403 in response to trying to add a new policy on the root Process Group (PG). By granting your user string (CN=admin, OU=NIFI) to the following global policies: you should have ability to set any additional policies anywhere. From the root canvas you can either right click and select "manage access policies" or click on the small key icon in the "Operate" panel that shows the PG name and PG assigned uuid to access the PG level access policies. First you would select the policy you want to add user or group strings to from the pull-down menu. If the policy has not yet had any users assigned to it, you'll need to click on "add policy" before being able to click on the Then click on the small "add users/groups to this policy" icon to the far right. Note: Child PGs added to the canvas will by default inherit all their policies from the parent PG. You will be given option to "override" this and set child specific users/groups on that child PG. You are stating that you encounter a 403 when you click on the "add" button? What do you see in the nifi-user.log when you perform that action? If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-06-2021
06:02 AM
1 Kudo
@CodeLa It is difficult for me to help determine issue in your dataflows without your complete dataflow. My guess is somewhere in the process of splitting your xml files the structure has change in such a way that the Java regex I provided no longer matches.
... View more
10-06-2021
06:00 AM
@CodeLa Everything you do via the NiFi canvas when building out your dataflows is preserved in the flow.xml.gz file written to disk. On NiFi service start, the flow.xml.gz is loaded in to memory and and then FlowFiles from the the flowfile_repository are loaded back in to the dataflow connections they were last in prior to NiFi being stopped. In order for your your canvas to be blank on NiFi service startup, that flow.xml.gz is missing. By default NiFi creates an archived copy of the flow.xml.gz each time a change is made. The following properties from the nifi.properties file control where the flow.xml.gz is written and if/how the archiving is setup: Recovering the flow from archive is as simple as copying the latest "<timesatmp>_flow.xml.gz" from the configured archive directory to the configured NiFi configuration file location and renaming it to "flow.xml.gz". Make sure proper ownership and permissions are set after you copy and rename (Must be owned and accessible by the NiFi service user). As a future suggestion: Another way to protect against a total loss of your flow is running a NiFi cluster instead of a single NiFi instance. In a NiFi cluster, every NiFi node preserves its own identical copy of the flow.xml.gz file. So should catastrophe strike on one of the nodes, you can simply copy the flow.xml.gz from any one of the other nodes to recover the bad node. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-06-2021
05:46 AM
@Ankit13 NiFi processors support "Timer Driven" and "Cron Driven" Scheduling Strategies. Timer driven strategy allows you to specify a time interval for scheduling (for example: "30 secs" which means processor will execute every 30 seconds.). Cron Driven strategy supports a Quartz Cron [1] being used to specify when the processor should execute. There is a third option on some processors which is Event Driven that should not be used. It was created long ago and considered experimental. It is has since been deprecated due to improvement made in the Timer Driven strategy. It only remains in NiFi to avoid breaking flows of those who use it when they upgrade. Important things to understand about your ask: Let's assume you configure your PutFile to execute using the Cron Driven scheduling strategy and the inbound connection to the putFile processor has multiple FlowFiles queued. When the processor executes it will process only 1 of those FlowFiles from that inbound connection queue with default settings. The next queued FlowFile would not get processed until the next scheduled cron execution. While there is no way to make sure that every queued FlowFile is processed in in a single cron execution you can change the configured Run Duration: The Run Duration tells the processor to continue to use the same execution thread to execute against as many queued FlowFile as possible within the configured run duration time. Let say it takes more than 2 secs to write the very first FlowFile to the target directory. In that case, only one FlowFile would be processed. So there would be no perceived difference between a run duration of 0ms and 2s. In a NiFi cluster, each node in the cluster executes the dataflow against the FlowFiles queued on that same node. So if FlowFiles were queued on the inbound connection to PutFile on all nodes, each node would execute 1 at each cron interval processing through FlowFile(s) per node as described above. [1] https://community.cloudera.com/t5/forums/replypage/board-id/Questions/message-id/229905 If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-05-2021
01:37 PM
@_fe_20 How exciting that you are diving in to NiFi. One things that those new to NiFi need to understand is that the dataflows execute independent of the authenticated user who built the dataflow. This means that every component (processor, controller service, reporting task, RPG, etc...) added by whatever user is actually being executed by the NiFi service user and not the user who happens to be logged in to the NiFi UI. So lets say your NiFi process is owned by a local "nifi" user. This means the ListSFTP and FetchSFTP processors are executed as the nifi user even though you have configured a different user in the processor's configuration. So just like from a terminal windows this would look like: "sftp -oIdentityFile=/path/to/private/keyfile <username>@<sftpserver>" is being executed by the nifi service user. So when it comes to the private key configured in the processor, it must be owned by the nifi service user. Now on to issue two. You are using a Putty Private Key (PPK). NiFi does not use putty, so you would need to extract your private key from the ppk file. for example: puttygen <yourppk>.ppk -O private-openssh -o <your>.pem Place this private pem key in a directory owned and accessible by the NiFi service user. Make sure permission are not open to group or other since this is a private key. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-05-2021
01:07 PM
Hello @edoS Welcome to the community! NiFi's provides so many option for user authentication and authorization, setting up exactly what you need can be overwhelming at times. This is certainly something the Cloudera support could walk you through if you have a support contract with us that covers the NiFi service. At a high level, here is what you need to understand about the authentication and authorization process in NiFi. Authentication happens first and must be successful before any authorization is verified. NiFi supports numerous ways to authenticate users/clients (TLS, Kerberos, LDAP, openID, etc...). No matter which method is used, the end result of any authentication is a user string that identifies the successfully authenticated user/client. That user string is then evaluated against the identity mappings [1] you may have configured in the nifi.properties file. These identity mappings are used to normalize the user strings. for example: Trim the CN from the full DN in a user/client certificate Trim the user name from a kerberos principal convert the user string to all uppercase or lowercase The resulting user/client string is then passed to the authorizer to verify that user/client is authorized for the NiFi Resource Identifier being requested. NiFi authorizers.xml is where this configuration is setup. This file is easiest to read from the bottom up. At the bottom of the authorizers.xml you will find your authorizer which you have setup as the "Ranger-Provider". It is important to understand how this authorizer works. NiFi runs a background thread that checks in with Ranger to see if there is a new policy definition for the NiFi service. If so, the new definition is downloaded by NiFi. What Ranger provides to NiFi in this downloaded policy definition are all the polices setup in Ranger. For each there will be the "NiFi Resource Identifier(s)" along with the user strings and group strings that have been assigned "Read" and/or "Write" permissions. Now remember up to this point all NiFi knows about the authenticated user is the user string. NiFi has no idea yet what groups that user string may belong to. Within the Ranger-Provider, you will find a property name with "User Group Provider". The value set here tell the authorizer where to check to see if the user string passed from authentication has any known user to group associations. Search your authorizers.xml for configured User Group Provider [2]. There are numerous options that can be configured for determining user to group associations. Some of the available providers allow you to configured multiple providers. While the authorizer "ranger-provider" can only point at 1, it may point at a "composite-configurable-user-group-provider" [3] for example that can be setup to reference multiple user-group-providers. The key here is making sure you have added 1 or more user group providers that will return all the user to group associations you need. Based on the log output you shared from the nifi-user.log. We know that none of the user group providers you may have setup returned any group strings associated to your user string (identity[18330301],groups[] ). This is why "groups [ ]" is empty. The "file-user-group-provider" [4] allows you to create user string to group string associations manually via the NiFi UI directly. The commonly used "ldap-user-group-provider" [5] determines user and group associations via user and/or group syncs with ldap/AD. Now that NiFi knows what groups the authenticated user string is associated with, the user and the groups can be checked against the downloaded policies to see if the user is authorized for the action being performed or the end-point trying to be accessed. [1] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#identity-mapping-properties [2] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#authorizers-setup [3] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#composite-implementations [4] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#fileusergroupprovider [5] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#ldapusergroupprovider If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-05-2021
12:16 PM
@CodeLa I setup a dataflow using the exact example you shared: After the ReplaceText, I see the content is now: Can you share your sample xml file that is not working? How is your split being done? Thanks, Matt
... View more
10-05-2021
12:06 PM
@Phanikondeti Does the host or IP you configured in this property match with what is assigned to the host? The following command will show you your NIFi host's hostname: hostname The following command will show you the IP addresses associated with your network interfaces on the host: ifconfig
ip address show The following command will allow you to see if some process is already binding to your configured port in the nifi.web.http(s).port= property in the nifi.properties file: netstat -anop|grep 8075|grep LISTEN If you get a return from above, it will include a Process Id (pid) that you can lookup using: ps -ef|grep <pid> The latest exception you shared is different form the first: "NiFi has started, but the UI is not available on any host". NiFi throws this WARN log line when the NiFi code returns no URLs post starting the NiFi JettyServer. In this setup, I would guess that you set the nifi.web.http(s).host= set to either blank or 0.0.0.0. So NiFi passes the earlier checks and start the JettyServer, but then when it tries to bind to all the network interfaces it finds known and throws the above WARN exception. This points at a setup issue on your Amazon EC2 setup and not an issue with NiFi. I'd use the above command to verify that your ec2 shows properly setup interfaces. If you find issues with your ec2 unbuntu interface setup, you may need to reach out to Amazon to help there. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more