Member since
07-30-2019
3392
Posts
1618
Kudos Received
1001
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 418 | 11-05-2025 11:01 AM | |
| 310 | 11-05-2025 08:01 AM | |
| 449 | 11-04-2025 10:16 AM | |
| 666 | 10-20-2025 06:29 AM | |
| 806 | 10-10-2025 08:03 AM |
05-20-2025
10:07 AM
@AndreyDE NiFi Connection Backpressure can not trigger changes in configuration of upstream processor. Often the client library dictates what happens when the client is executed and NiFi may have no control over number returned. The NiFi scheduler that is responsible for giving a processor component a thread to execute looks at the downstream connection(s) coming off a processor and if any are applying Backpressure, it will not give that upstream processor a thread. So Backpressure thresholds are soft limits. So when ListSFTP gets a thread it executes the SFTP client and it returns all files based on the filtering configurations. Number 1 and number 2 are not possible with ListSFTP due to limitations I described above. For number 3 you have a couple options: A) You could place a ControlRate processor between ListSFTP and FetchSFTP to control the rate at which FlowFiles are moved between those processors. You can then throttle the rate at which FlowFiles are moved from the connection feeding from ListSFTP to your downstream dataflow allowing time for those to process before next batch is passed downstream. B) Have your ListSFTP connect to a child process group which you configure "FlowFile Concurrency" settings set to "Single FlowFile Per Node". You can then place yoru downstream processing dataflow in this child process group which would not allow another FlowFile to enter the process group until FlowFile in that process group is either auto-terminated or exited the process group. The "Single FlowFile Per Node" concurrency setting means you would not have ability to configured multiple concurrent FlowFile processing which makes this a less desirable option. C) Combination of both A and B ListSFTP feeds a ControlRate processor (configure to allows 500 FlowFiles per minute) that lets batches of 500 FlowFiles to move from ListSFTP connection queue to connection queue connecting to a Child Process group. Configure the backpressure threshold on the connection feeding the child PG to 500 also so Backpressure gets applied as soon as controlRate allows a batch through. This backpressure will prevent ControlRate from getting scheduled again until this queued batch gets consumed by the child process group. On this child process group you also configured "FlowFile Concurrency" except configured with "Single Batch Per Node" which will allow this Process group to consume all FlowFiles from the inbound connection at once. It will then not consume again from the inbound connection until All FlowFiles are empty from the child process group. This design method control size of batches being processed at one time in the child process group while still allowing concurrent execute of multiple FlowFiles by processor components within the child process group. Option C dataflow would look something like this: Here you can see no more then 500 being processed in the child Process Group at one time. Back pressure of 500 on the connection feeding the Child Process group is set at 500 preventing ControlRate from adding another 500 to that connection until it is emptied when the Child process group accepts the next batch only happens when FlowFile count inside child process groups hits 0. Inside the process group is where you build yoru dataflow to fetchSFTP and process however you need the batches of FlowFiles. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
05-19-2025
11:33 AM
@blackboks Yes, that is correct unless you can sync user identity to group identity associations via one of the available user-group-providers available in NIFi/NiFi-Registry. NiFi System Administrator Guide FileUserGroupProvider LdapUserGroupProvider AzureGraphUserGroupProvider Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
05-19-2025
10:22 AM
@BobKing I have not been able to reproduce. I downloaded WinZip, created a simple text file and then zipped it. I then consumed that .zip file using CFM 2.1.7.1001 and was able to successfully use UnpackContent to unpack the text file. I recommend, if you have a support contract with Cloudera, that you create a support case where you can share more detail about the problematic zip files and an example file if possible. I suspect the issue is specific to the zip files you are working with. UnpackContent does not support mulit-part zip files (NIFI-10654). Thank you, Matt
... View more
05-19-2025
05:22 AM
@BobKing Welcome to the Cloudera Community. It is going to be difficult to determine what is going on here without a sample failing zip file to reproduce with. What can you tell me about these WinZip files? How are they generated? Do they contain any files or only contain directories? (NiFi on creates FlowFiles for actual content, so zip file containing non files and only a bunch of empty directories would fail to unpack. Are these multi-part zip files? Thank you, Matt
... View more
05-16-2025
01:04 PM
@blackboks Authentication and Authorization happen in two steps in NiFi and NIFi-Registry. Group association with Users is part of the Authorization step handled by the configuration in the authorizers.xml file. Authentication is step one which you have working. At the end of authentication all that is available and passed to for authorization is the User Identity. In yoru case " nifi-admin-2@blackboks.ru " is what is being passed to the configured authorizer. You are most likely using the managed-authorizer which utilizes the file-access-policy-provider which in turn has a dependency on one or more configurable user-group-providers (file-user-group-provider, ldap-user-group-provider, composite-user-group-provider, composite-configurable-user-group-provider). It is these user group provider that are responsible for establishing what groups the user identity belongs to. What we can tell from the log output you shared is that your authorizer is unaware of any gorups that the user identity " nifi-admin-2@blackboks.ru " belongs to. If the authorizer was aware of any groups associated to this user identity, those groups would have been in that log output instead of blank: identity[nifi-admin-2@blackboks.ru], groups[] So you'll need to verify the setup in your authorizers.xml and determine which user-group-provider you will use to establish these known user to group identity mappings. The file-user-group-provider would require you to do this manually from within the NiFi UI. Hopefully this helps clarify the why you are seeing what you are seeing. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
05-14-2025
12:43 PM
@asand3r With Archive disabled, NIFi is no longer tracking the files left in the archive sub-directories. You can remove those files while NiFi is running. Just make sure you don't touch the active content_repository claims. Matt
... View more
05-14-2025
11:54 AM
@alan18080 The Single-User-Provider for authentication was not intended for production use. It is a very basic username and password based authenticator that support only a single user identity. When you access the UI of a NiFi node, you are authenticating with only that node. The provider generates a client token which your browser holds and a corresponding server side token/key held only by the node you authenticated with. This is why you need to use sticky sessions (Session Affinity) in your load-balancer so that all subsequent request go to same NiFi server. There is no option in NiFi that would allow that client JWT token to be accepted by all nodes in a NiFi cluster because of the uniqueness of the JWT generated token to a specific node. Related: NIFI-7246 Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
05-14-2025
10:29 AM
1 Kudo
@asand3r Changing following to false turns off archiving. nifi.content.repository.archive.enabled NiFi does not clean-up files left in these directories once archive is disabled. Since archive is disabled the archive code that would scan these directories to remove old archive data is not longer executing. You'll need to manually purge the archived content claims from the archive sub-directories after disabling content_repository archiving. So your two nodes that still have archive data had that data still present at shutdown while the others did not have archive data after shutdown. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
05-14-2025
05:52 AM
@asand3r Need some more detail to provide a good answer here... What version of Apache NiFi or Cloudera Flow Management are you using? After changing "nifi.content.repository.archive.enabled" to false in the nifi.properties file, did you restart NiFi? If you manually inspect the archive sub-directories, do any of them still hold files or are all of the archive sub-directories within the content_repository empty? If they are empty then archive clean-up is complete. You mention " I've saw messages, that archived data is never cleanup", can you share this message you are seeing which I assume is from the nifi-app.log? Keep in mind that disabling archive will not prevent content_repository from filling the disk where it resides to 100%. Content claims associated to actively queued FlowFiles within your dataflows on the NiFi canvas will still exist in the content_repository. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
05-14-2025
05:24 AM
2 Kudos
@s198 There is no other processor that provides this same functionality. This Apache community processor is not tested or maintained by Cloudera and thus is not included in the list of Cloudera supported NiFi processor components. This does not mean that the processor has any known issues, but does mean that Cloudera would not be obligated to provide bug fixes if they did arise nor provide support for this processor component. If this is an important processor for you and you have a Cloudera Flow management license, I would encourage you to raise this with your Cloudera Account owner requesting that Cloudera add this component to the list of supported components. Just making this formal request does not guarantee it will be added, but gets visibility around the processor for consideration. Thank you, Matt
... View more