About MattWho

MattWho · ‎01-31-2025

@mslnrd You are absolutely on the correct track with setting up users and groups in NiFi. Since all synced users and groups are loaded into NiFi's heap memory, it is best practice to limit what is synced to just those users and groups who need to be authorized to access your NiFi. The easiest way to do this is by only syncing the specific groups that contain the users requiring authorized access to your NiFi. For syncing users and groups from AD/LDAP, your authorizers would be configured with the ldap-user-group-provider. Since you can already successfully connect and sync users and groups from your AD/LDAP, i'll just focus on the properties used to control which users and groups are synced: Default settings: <property name="User Search Base"></property> <property name="User Object Class">person</property> <property name="User Search Scope">ONE_LEVEL</property> <property name="User Search Filter"></property> <property name="User Identity Attribute"></property> <property name="User Group Name Attribute"></property> <property name="User Group Name Attribute - Referenced Group Attribute"></property> <property name="Group Search Base"></property> <property name="Group Object Class">group</property> <property name="Group Search Scope">ONE_LEVEL</property> <property name="Group Search Filter"></property> <property name="Group Name Attribute"></property> <property name="Group Member Attribute"></property> <property name="Group Member Attribute - Referenced User Attribute"></property> First note that it is NOT necessary to configure both the user and group search properties in order to sync both user and group identities. (NOTE: do not unset "class" or "scope" as NiFi will not start) I would recommend the following setup: <property name="User Search Base"></property> <property name="User Object Class">person</property> <property name="User Search Scope">SUBTREE</property> <property name="User Search Filter"></property> <property name="User Identity Attribute">sAMAccountName</property> <property name="User Group Name Attribute"></property> <property name="User Group Name Attribute - Referenced Group Attribute"></property> <property name="Group Search Base">OU=Groups,DC=my,DC=network,DC=com</property> <property name="Group Object Class">group</property> <property name="Group Search Scope">SUBTREE</property> <property name="Group Search Filter">(|(sAMAccountName=group1)(sAMAccountName=group2)(sAMAccountName=group3))</property> <property name="Group Name Attribute">sAMAccountName</property> <property name="Group Member Attribute">member</property> <property name="Group Member Attribute - Referenced User Attribute"></property> With above configuration the following will happen: A sync of all user will not happen since the user Search Base is not configured. A group sync will happen which will sync only <group1>, <group2>, and <group3> based on he configured Group Search Filter. For each synced group this provider will return all the <member> lines/attributes. For each of those returned members (typically full user DNs), the user will be looked up to obtain the users sAMAccountName identity string (this happens because "sAMAccountName" is configured in the "User Identity Attribute" property. These returned sAMAccountName user identities will be synced in NiFi to the appropriate <group1> or <group2> or <group3> "sAMAccoutName" group identity. Now you can setup Authorizations for either <group1> or <group2> or <group3> for the various NiFi policies. Creating groups in AD/LDAP for the various teams/roles in NIFi allows you to more granular control accesses in NiFi. Once you have authorized your groups any users that are later added to any one of these groups will automatically gain authorized access when the next sync happens in NiFi Default every 30 mins). Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-31-2025

@doora Welcome to the Cloudera Community! I am not completely clear on the use case described in your query. You have a directory (local directory on the NiFi host or remote directory) whee files are dropped daily. NiFi ingests these files, then parses the content of the file, then deletes the source file, and finally terminates the NiFi FlowFile. Correct? Are you using ListFile and FetchFile to pull these files into NiFi? You want to somehow monitor if a file does not appear in the directory. To do so implies some static naming of these daily files? You could accomplish this through a creative dataflow design if you know the names of the files you are expecting each day. Perhaps using a GenerateFlowFile to create a 0 bytes FlowFile with the filename of the expected file to be fetched within 24 hours and an attribute that captures current time. Configure this processor to run on a cron once a day. This processor would then connect to a FetchFile processor that attempts to fetch that filename from the configured directory. This processor has a not.found relationship which you could connect to a RouteOnAttribute processor which you could configure with two dynamic relationships (one that checks to see if current now minus now recorded by GenerateFlowFile is then 24 hours and another that check if it is greater then 24 hours). The relationship for less then 24 hours would get routed back to FetchFile to check again, The relationship for greater then 24 hours could be routed to perhaps a PutEmail processor to send out an email notifying that filename <xyz> was not found in the past 24 hours t which time you terminate this FlowFile since GenerateFlowFile via it's cron would create a new FlowFile to starting looking for this file in the next 24 hour time window. I would recommend adjusting the run schedule on the RouteOnAttribute to run less often (maybe every 10 minutes) because leaving it at 0 secs will have you FlowFile rapidly looping between FetchFile and RouteOnAttribute until it expires (older then 24 hours) or is found. This would lead to excessive unnecessary resource usage. Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-31-2025

@mslnrd While kind of related, to prevent confusing other community members, it would be better to start a new community question for this. This question involves proper authorizers.xml configuration and the new question is specific to user-group-provider configuration. Feel free to ping me in that new question so I get notified when it is created. Thank you, Matt

MattWho · ‎01-30-2025

@pbn Welcome to the Cloudera Community. The Apache NiFi community made the decision to remove nifi-toolkit-admin in NIFI-11316 which included: File Manager Flow analyzer Node manager Notify There are no plans to bring these back in NiFi 2.x. NiFi deployments should protect the NiFi repositories from data loss through the use of RAID. Taking backups of the NiFi repositories (mainly flowfile, content, and provenance) makes little sense as the contents of these repositories continuously change. Backup of the config files in the NiFi conf directory and maintaining a copy of the flow.json.gz would allow you to recover a node. Keep in mind that the flow.json.gz is the same on all nodes in a NiFi cluster so it can be restored from any node to a new node. Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-29-2025

@mslnrd Authentication and authorization are two separate configurations. A user must successfully authenticate before any authorization is checked. So from your description, you are getting the NiFi login in window and successfully authenticating using your AD sAMAccountName and password. This means that the case sensitive username you entered at login is being passed on for authorization handled by the configured authorizer in the authorizers.xml. The authorizers.xml is easiest to read from the bottom up starting with the authorizer. Looking at what you shared, we see the "managed-authorizer" being used which has a dependency on the "file-access-policy-provider" (which persists all the configured authorizations in the authorizations.xml file). Now looking at the "file-access-policy-provider", we see it has a dependency on the "file-user-group-provider" for understanding what groups an authenticated user belongs to. If we then look at the "file-user-group-provider", it simply allows you to manually define new user identities and associated them with manually defined group identities. Which from you query sounds like what you have been doing thus far. We can also see that you have added the "ldap-user-group-provider" to the authorizers.xml; however, from reading the file as i described above we can see no path of reference from authorizer to this ldap-user-group-provider. That means the authorizers is not using any users and groups this provider may be returning. Now fixing this configuration issue has two possible paths. 1. You can reconfigure the "file-access-policy-provider" to use the "ldap-user-group-provider" 2. You can configure the "file-access-policy-provider" to use a "Composite-configurable-user-group-provider" (which can be configured to get group info form multiple user-group-providers). Note: You'll need to use the "Composite-configurable-user-group-provider" if using the configurable file-user-group-provider as one of the providers. The file-user-group-provider can NOT be configured in the "Composite-user-group-provider" Option 2 allows more flexibility because you can authorize server client auth certificates which are not typically in AD/LDAP. Such as authorizing NiFi nodes to talk to one another in cluster or authorizing one NiFi to connect to another NiFi via NiFi Site-To-Site capability. With Option 2, you need to be aware that multiple user group providers can NOT return the same user or group identity string. Since you have already added your users and groups manually via the file-user-group-provider, NiFi will error on startup complaining that multiple providers have returned the same identity. So you will need to rename/remove the existing users.xml file and unset the "Initial User Identity 1" field in the file-user-group-provider only. On Startup, NiFI will pull in user and groups via your ldap-user-group-provider configuration and you will still have the option to manually define additional non AD/LDAP user and group identities if needed via the NiFi UI. An example authorizers.xml setup of what is described above is found here in the NiFi Admin Guide: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#composite-file-and-ldap-based-usersgroups Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-28-2025

@Shampy Apache NiFi does NOT support the older flow.xml.gz format. It can only load the newer flow.json.gz format. Your Apache NiFi 1.27 should be producing both the flow.xml.gz and flow.json.gz flow storage formats. You'll need to use the flow.json.gz format in your NiFi 2.x installation. Apache NiFi 1.x introduced the newer flow.json.gz flow storage format in Apache NiFi 1.16 and newer. In those newer versions of Apache NiFi 1.16+ will generate the newer flow.json.gz format and still maintain the older flow.xml.gz format. This positions you for upgrading to Apache NiFi 2.x. You'll now have the flow.json.gz needed to load in your 2.x version. The proper path to Apache NiFi 2.x is to first to upgrade to the latest Apache NiFi 1.x release. Before upgrading to Apache NiFi 2.x version, you should review the release notes between your current version to the version you plan to upgrade to. This allow you to see if you are using and components that have been removed or if any breaking changes impact your dataflows: https://cwiki.apache.org/confluence/display/NIFI/Release+Notes/#ReleaseNotes-Version2.1.0 NOTE: In Apache NiFi 1.x versions that support both the flow.xml.gz and the newer flow.json.gz format, the flow.xml.gz format file will be ignored on startup if a flow.json.gz exists. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-27-2025

@vystar Considering the breaking changes that are part of Apache NiFi 2.0/1, there is considerably more work in preparing for an upgrade to the that new major release. So I would recommend upgrading to the latest offering in the Apache NiFi 1.x branch. You'll want to review all the release notes from 1.13 to the latest release: https://cwiki.apache.org/confluence/display/NIFI/Release+Notes/#ReleaseNotes-Version1.12.0 You'll want to pay close attention to any mentions of components being moved to optional build profiles. This means that these nars and the components they contain are no longer include with the Apache NiFi download and if needed must be downloaded form Maven Central and manually added to NiFi. Deprecated components still exist in the Download, but will not exist in NiFi 2.x releases. Make sure to maintain a copy of your flow.xml.gz/flow.json.gz (newer releases). The newer Apache NiFi 1.x load a flow.json.gz instead of the older flow.xml.gz on startup. However, in the absence of a flow.json.gz and the presence of flow.xml.gz, NiFi 1.x will load from the flow.xml.gz and produce the new flow.json.gz. After upgrade, you'll still need to review your dataflows. There are some bad practices that are now blocked by Apache NiFi that may leave some components invalid until manual action is taken to resolve the bad configuration (such as using "primary node" execution on any processor that has an inbound connection). As far as FlowFile distribution, do it early in your dataflows as possible. Utilize list/fetch still processors instead of get style. (Example: use ListSFTP and FetchSFTP in place of GetSFTP. This allows you to load-balance the 0 byte listed files before the content is fetched for the files). Other options like Remote Process Groups can be used (they come with some overhead, but do some target NiFi Cluster load based distribution when dealing with large volumes of FlowFiles. Not so great for low volumes.). Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-27-2025

@ose_gold Looping back on number 6. When you see the active thread indicated in the upper right corner, do you see tasks as completing or the thread just remains active all the time? Do you see any data being written to your Redis distributed map cache server? Thanks, Matt

MattWho · ‎01-27-2025

@vystar Welcome to the community. The first observation is your NiFi version being 1.12.1 released 6 years ago. There have been a lot of bug fixes and improvements made to load balanced connection since then. I strongly encourage you to upgrade to much newer version of Apache NiFi. Once a NiFi connection has load balanced the FlowFiles in the connection, it will not redistribute them again. So if your other two nodes receive their round robin distribution and have capacity to process them faster the connection will not round robin the other FlowFiles in the connection left on 1 node again. Doing so would be very expensive as each node would be trying to redistribute already round robin distributed FlowFiles over and over again. Maximizing throughput in NiFi often requires looking at all your dataflows, configurations, designs, memory and cpu usage data. Is the ExecuteStreamCommand processor the only slow point in your dataflow? What is it executing? how is it configured? Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-24-2025

@ose_gold Welcome to the community. Can you share more details. Try using the latest Apache NiFi 2.1.0 release instead of the very first unofficial maintenance release 2.0.0. Does the processor show as valid? From the server running NiFi, can you resolve and connect to the FTP server via command line? Do any of the files on the FTP server have a last modified timestamp newer then 3 hours old? Try changing "Entity Tracking Initial Listing Target" from "tracking time window" to "All available". Does it produce a FlowFiles? When you start listFTP processor, does it indicate an active running thread in the upper right corner? What do you see in the nifi-app.log when processor runs? Any exceptions? Try puttingg this processor class (org.apache.nifi.processors.standard.ListenFTP) in DEBUG in the NiFi logback.xml file to get additional logging output. With using "tracking timestamps" instead does it produce flowfiles? Hopefully some of the above checks can help narrow focus of where the issue exists. Thanks you, Matt

Online	Offline
Last Visited	‎01-15-2026 01:38 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎01-15-2026 01:38 PM
Posts	3,421
Kudos received	1624

Cloudera Community

Re: Best Practice for configuring registry flows

Re: Nifi 2.7.2 Start Problem

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: LDAP User/Group Search Filter

Re: How to control missing files in NiFI?

Re: NiFi Authentication with LDAP Groups

Re: NiFi 2.x and "file-manager.sh"

Re: NiFi Authentication with LDAP Groups

Re: Nifi 2.0-M2 Support with flow xml gz

Re: Nifi cluster load balance doesn't work well

Re: ListFTP Tracking Entities get no files

Re: Nifi cluster load balance doesn't work well

Re: ListFTP Tracking Entities get no files