About MattWho

MattWho · ‎07-30-2025

@GKHN_ Welcome to the Cloudera Community. You shared numerous configurations and I see numerous configuration issues. Lets start with Authentication and Authorization basics before diving in to the configuration issues. Authentication and authorization are two separate processes. First you need to successfully authenticate your user. At the end of a successful authentication you will have a user identity string (case sensitive) that NiFi uses to identify your authenticated user and it is the user string that is passed to the NiFi authorizer to determine what policies have been granted to that specific user identity string. You appear to be using the ldap-provider (I assume your nifi.properties has been properly configured to use it and you are being presented with the NiFi login screen). I see you have it configured to take your sAMAccountName value as your username at the login window; however; I also see that you have it configured to use the full DN (USE_DN) for your user that is returned by yoru ldap as your user identity string upon successful authentication. I don't think that is what you want here, so I recommend changing from "USE_DN" to "USE_USERNAME" which will pass your username entered in the login window to the authorizer upon successful authentication. Now when we look at the authorizer you shared, you'll want to read it from the bottom up stating with the authorizer (managed-authorizer in your configuration). Within the "managed-authorizer" authorizer, you are configured to used the "file-access-policy-provider", so you should scroll up until you find the "file-access-policy-provider". <accessPolicyProvider> <identifier>file-access-policy-provider</identifier> <class>org.apache.nifi.authorization.FileAccessPolicyProvider</class> <property name="User Group Provider">file-user-group-provider</property> <property name="Authorizations File">./conf/authorizations.xml</property> <property name="Initial Admin Identity">CN=NAME SURNAME,OU=CompanyUsers,OU=Company,DC=company,DC=entp</property> <property name="Node Identity 1"></property> <property name="Node Group"></property> </accessPolicyProvider> I see you have configured what i assume is your user's full DN (hopefully matching case sensitive with what you see in the upper right corner of the NiFi UI and as seen in the nifi-user.log). This provider will generate the authorizations.xml file ONLY if it does not already exist. So if you make any changes to this provider those changes will not be made to an existing authorizations.xml file. So you'll need to remove this file until you have your setup working for your initial admin to gain access to NiFi. This provider's job is to seed initial policies for your admin user and the nifi nodes in NiFi cluster setup. But in order to seed those policies NiFi needs to know about the configured user identity (DN you set currently). To do that the file-access-policy-provider is configured with a "user-group-provider" which we can see you have set to the "file-access-policy-provider". <userGroupProvider> <identifier>file-user-group-provider</identifier> <class>org.apache.nifi.authorization.FileUserGroupProvider</class> <property name="Users File">./conf/users.xml</property> </userGroupProvider> We can see in this provider, you have not configured any initial user identities, So NiFi is not going to be able to find the user identity based on the DN set in the file-access-policy-provider in order to seed those initial admin needed policies. The file-user-group-provider will ONLY generate a users.xml file if one does not already exist. So modification to this provider will not make changes to an existing users.xml file. I also see that you have added and configured the "ldap-user-group-provider" in your authorizers, but as you can see from above their is no configured dependency from the authorizer to this user group provider, so it is not being used even though it is configured in the authorizers.xml file. In order for it to be used it must be called by another provider. In your case this would mean adding maybe the "composite-configurable-user-group-provider". This provider allows you reference multiple provider (1 configurable provider like the file-user-group-provider and 1 or more non configurable providers like the ldap-user-group-provider". ( A configurable provider is one that allows you to manually define additional user or group identities directly from within the NiFi UI). Even though your "ldap-user-group-provider" is not being used by yoru authorizer currently, it has several configuration issues. <userGroupProvider> <identifier>ldap-user-group-provider</identifier> <class>org.apache.nifi.ldap.tenants.LdapUserGroupProvider</class> <property name="Authentication Strategy">SIMPLE</property> <property name="Manager DN">LDAP_USER</property> <property name="Manager Password">Password1</property> <property name="TLS - Keystore">/home/nifi/nifi/nifi-2.4.0/conf/srt.pfx</property> <property name="TLS - Keystore Password">Password</property> <property name="TLS - Keystore Type">JKS</property> <property name="TLS - Truststore">/home/nifi/nifi/nifi-2.4.0/conf/gbkeystore.jks</property> <property name="TLS - Truststore Password">Password</property> <property name="TLS - Truststore Type">JKS</property> <property name="TLS - Client Auth"></property> <property name="TLS - Protocol">TLSv1.2</property> <property name="TLS - Shutdown Gracefully"></property> <property name="Referral Strategy">FOLLOW</property> <property name="Connect Timeout">10 secs</property> <property name="Read Timeout">10 secs</property> <property name="Url">ldap://ldap.entp:389</property> <property name="Page Size"></property> <property name="Sync Interval">30 mins</property> <property name="Group Membership - Enforce Case Sensitivity">false</property> <property name="User Search Base">OU=CompanyUsers,OU=Company,DC=company,DC=entp</property> <property name="User Object Class">person</property> <property name="User Search Scope">ONE_LEVEL</property> <property name="User Search Filter">(sAMAccountName={0})</property> <property name="User Identity Attribute"></property> <property name="User Group Name Attribute"></property> <property name="User Group Name Attribute - Referenced Group Attribute"></property> <property name="Identity Strategy">USE_USERNAME</property> <property name="Group Search Base"></property> <property name="Group Object Class">group</property> <property name="Group Search Scope">ONE_LEVEL</property> <property name="Group Search Filter"></property> <property name="Group Name Attribute">cn</property> <property name="Group Member Attribute">member</property> <property name="Group Member Attribute - Referenced User Attribute"></property> </userGroupProvider> Lets start with the face that the following is not even a property in this provider (It only exists in the ldap-provider found in the login-identity-providers.xml configuration file: <property name="Identity Strategy">USE_USERNAME</property> While the following property exist in both the ldap-user-group-provider aand teh ldap-provider, its configuration in the ldap-user-group-provider is incorrect: <property name="User Search Filter">(sAMAccountName={0})</property> The "{0}" when used in the ldap-provider with the login-identity-providers.xml will substitute in the username provided at login. The ldap-user-group-provider is syncing users without any external input so this would be treated as a literal a result in no ldap returns. Typically you would use filters here just like you would with ldapsearch to limited the number of user returned (for example filter only user that are members of specific ldap groups). I also see you have group search partial configured, but have no Group Search Base configured. You also have no "user identity Attribute" configured which tells NiFi which ldap field contains the user identity NiFi will then use. This might be where you put "sAMAccountName". I recommend going back to the NiFi admin guide and looking at the example configuration found below the StandardManagedAuthorizer section. The fact that you stated you added user and set policies via the NiFi UI, tells me at some point in time you had a different configuration then shared above that resulted in your initial admin gaining access. Always remember that NiFi is case sensitive and the users identity (whether it is the username entered in login window or the user's full DN) must match exactly with the user identity you are authorizing against the various policies. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎07-30-2025

@justloseit NiFi Process groups are just logical containers for processors. A Process group does not run/execute. Selecting "Start" on a process group triggers starting of all the components within that process group. In your case it sounds like you have have setup cron scheduling on your ingest/starting processor(s) within the process group. All downstream processors to that source should be set run all the time and not cron based scheduling. So what you are really looking for is how long it took the processors within that process group to process all produced FlowFiles to point of termination? Besides looking at the lineage data for each FlowFile that traverses all the processor in a process group, I can't think of how else you would get that data. Take a look at the SiteToSiteProvenanceReportingTask available in Apache NiFi. It allows you send the provenance data (produces a lot of data depending on size of yoru dataflows and amount of FlowFiles being processed) via NiFi's Site-To-Site protocol to another NiFi instance (would recommend a separate dedicated NiFi to receive this data). You can then build a dataflow to process that data how you want to retain what information you need, or send it to an external storage/processing system. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎07-29-2025

@asand3r A node must be "disconnected" before it can be offloaded. Only a running node can be offloaded. So as long as a node is running. it's UI will be accessible. A couple options: https://<nifi-node>:<nifi-port>/nifi-api/controller/cluster. <-- this end point when hit will return the following type responses: If node is connected, it will return json that includes the connection status of all nodes that are part of cluster. So this response could be parsed to see if any nodes are disconnected. So you could parse status of all nodes from a single connected node's response If node is not connected, it will return "Only a node connected to a cluster can process the request" which tells you that node is disconnected. https://<nifinode>:<nifi-port>/nifi-api/flow/cluster/summary If node is connected, it will return a response like this: {"clusterSummary":{"connectedNodes":"3 / 3","connectedNodeCount":3,"totalNodeCount":3,"connectedToCluster":true,"clustered":true}} If node is disconnected, it will return a response like this: {"clusterSummary":{"connectedNodeCount":0,"totalNodeCount":0,"connectedToCluster":false,"clustered":true}} You'll need to parse the responses for connectedToCluster. If true use that node's UI, if false, exclude that node's UI. Of course if node is completely down, above will not help unless you mark as unavailable if no response. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎07-22-2025

@asand3r I have seen many times where user connect "Failure" relationship via connection to another component that is not running allowing failures to queue up in addressed. These "forever" queued FlowFiles can keep these content claims from being moved to archive and thus result in high content_repository usage. Another common observation is building test flows that have queued FlowFiles in them. A The many FlowFiles that can be part of one content claim file can come form anywhere in any dataflow on the canvas. If you found any of my responses helped you please take a moment to click "Accept as Solution" on them to help our community members. Thanks, Matt

MattWho · ‎07-22-2025

@Siva227 The error you shared is when the node is trying to reconnect to the cluster following a disconnection. So first we need to identify why the node disconnected originally. I suspect you are disconnecting due to lack of heartbeat or your node failed to process a change request from the cluster coordinator node. Cluster Size: 6 nodes Each Node: 32 vCPUs, 256 GB RAM JVM Heap Memory: 192 GB (configured per node) Max Timer Driven Thread Count: 192 Processor Count: Over 10,000 processors across the flows Any specific reason why you configured your NiFi to use so much heap memory. Large heaps like this result in long stop-the-world Garbage Collections (GC). These long garbage collection stop-the -world events can lead to disconnections as a result of lack of heartbeat from that node. A common mistake is setting heap very large simply becuase you have a lot of memory on the node. You want to use the smallest heap possible as needed by yoru dataflows. GC does not kick in until heap usage reaches ~80%. The below property controls heartbeat interval and lack of heartbeat disconnection: nifi.cluster.protocol.heartbeat.interval=5 sec The cluster coordinator will disconnect a node due lack of heartbeat if a heartbeat has not been received for 8 times this configured value (40 seconds in this case). It is very possible you encounter GC that last longer then this. I recommend changing your heartbeat interval to 30 sec which will allow up 4 mins of missed heartbeats before the cluster coordinator will disconnect a node. The following error shared, while not initial cause of node disconnection, is preventing node from reconnecting: Node disconnected due to Proposed Authorizer is not inheritable by the Flow Controller because NiFi has already started the dataflow and Authorizer has differences: Proposed Authorizations do not match current Authorizations: Proposed fingerprint is not inheritable because the current access policies is not empty. This implies that there are differences in the authorizations.xml file on this node versus what the cluster has in its authorizations.xml. You also state this is the error seen ver often after a node disconnection? Are you often modifying or setting up new authorization access policy when you have a node disconnect? I'd start with identifying the initial cause of node disconnection which I suspect is either lack of heartbeat or failed to replicate request to node resulting in node being disconnected. Both of which can happen with long GC pauses. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎07-22-2025

@asand3r It is important to understand what offloading does and does not do. Offloading is the moving of queued NiFi FlowFiles from the offloading node to other nodes in the NiFi cluster. A NiFi FlowFile consists of two parts: FlowFile attributes/metadata (stored in the flowfile_repository) and FlowFile content (stored with claims inside the content_repository). A content claim can contain the content for 1 too many individual FlowFiles. Offloading does not move content claims (what you see in the content_repository) from one node to the other. It moves FlowFiles, which means it moves the FlowFile attributes and the content for that FlowFile over into FlowFiles on the node it is being transferred to. I suspect that the 21GB of content equated to the amount of content currently queued FlowFiles on node 2. The cumulative size of the active FlowFiles queued on a NiFi node does not equal the cumulative size of active content claims in the content_repository. Every content claim has a claimant count. This claimant count keeps track of how many FlowFiles have a claim against some bytes of data in the content claim. As a FlowFile reaches a point of termination in a NiFi dataflow, the claimant count on a content claim is decremented. Only once the claimant count reaches 0 on a content claim does that content claim become eligible to be moved to an archive sub-directory and considered for removal under the archive configurations set in the nifi.properties file. So it is possible for a 1 byte FlowFile still queued on the NiFi canvas to prevent a much larger content_claim from being archived. So I do not suspect you lost any data during offload. 1. Are you working with many very small FlowFiles? 2. Are you leaving FlowFiles queued in connections that never get processed through to termination? You are also correct that your other nodes would have begun processing the transferred FlowFiles as they were being placed in the queues on those nodes. The large difference in FlowFile count of disk usage between nodes is often the direct result of the dataflow design choices. I suspect the node 2 was your elected primary node at the time prior to disconnect and offload operations. The elected primary node is the only node that will schedule "primary node" only configured processors. So unless you utilize load balanced configurations on the connection following the "primary node" only scheduled ingest processor, all the FlowFiles will be processed only on that primary node in the downstream processors in the dataflow. Also keep in mind that you are using a fairly old version of Apache NiFi (~4 years old) The latest and final release of Apache NiFi 1.x line is 1.28. So I recommend upgrading to this latest 1.28 version and start planning for an upgrade to the NiFi 2.x release versions. Apache NiFI has also release the Apache NiFi 2.x new major release version. There is a fair amount of work that must be done before you can migrate from Apache NiFi 1.28 to Apache NiFi 2.x version. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎07-22-2025

@jfs912 NiFi 2.x is secured in the same way as NiFi 1.x was secured. The Apache NIFi community made a decision to move away from providing a toolkit for generating TLS certificates since it did not make sense to have its own tool to manage certificate creation when there is nothing special the toolkit does (NIFI-12200) . You can create your own self signed certificates (simple web search will show many resources on how to do this), you can use a certificate service to create yoru certificates and sign them for you, you can setup your own certificate authority for signing your certificates, etc. It really made no sense for Apache NiFi to maintain this code. The important parts to remember when creating your NiFi certificates are: Keystore for NiFi: Must contain 1 and only 1 PrivateKey entry PrivateKey entry DN must NOT contain wildcards. The DN is used as the client identity and use of wildcards in client identities is not a proper security practice. PrivateKey entry must have ExtendedKeyUsage with both ClientAuth and ServerAuth. NiFi uses this certificate for both client and server authorizations PrivateKey entry must include at least one SAN entry that matches the hostname of the NiFi server on which the certificate will be used. During a TLS exchange hostname verification is performed to make sure the hostname accessed via the URL used matches a SAN entry in the Server certificate the target server responds with. It is not uncommon for a certificate to contain more then one SAN entry. Recommend that PrivateKey password and Keystore password are the same. Truststore for NiFi: The NiFi truststore contains one too many trusted cert entries. Some choose to use the Java default cacerts (truststore) file and just add the NiFi additional trusted certs entries to it. This same truststore is then used on every node in a NiFi cluster. Must contain the complete trusts chain for the NiFi certificates. A NiFi certificate may be self signed meaning the issuer and signer are the same DistinquishedName (DN). I this case the public cert for each of yoru NiFi node's certificates needs to be added to the truststore. A NiFi certificate might be signed by an Intermediate Certificate Authority (CA). An intermediate CA would be the signer DN for the NIFi certificate. An Intermediate CA will have a different Issuer and Signer DN. The truststore must contain the public cert for this Intermediate CA. There may be multiple levels of intermediate CAs before reaching the signer that is the root CA. The root CA public certificate can be identified because it will have the same DN for issuer and signer. The truststore must contain the public certificate for the root CA. Having all the public certificates for every signer for the NiFi certificate to the Root CA makes up the complete trust chain required for trust of the signed NiFi certificate. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎07-15-2025

Preferred to have new question for unique questions, but.... Keep in mind these things when it comes to user identities and authorizations: 1. user identities are case sensitive in NiFi. So the user identity from oidc must match exactly the initial admin user identity you passed to the file-user-group-provider and the file-access-policy-provider. 2. The file-user-group-provider will only generate the users.xml file if it does NOT already exist. So if you edit the configuration of the file-user-group-provide after the users.xml already exists, those change will not be made to that existing users.xml file. (remove it and restart so it gets created again) 3. The file-access-policy-provider will only generate the authorizations.xml file if it does not exist. It also will not modify an already existing authorizations.xml file if you make modifications to the provider. 4. Additional user identities and authorization post accessing your NiFi with your initial admin are done directly with the NiFi UI. Those additions made from the UI will be persisted in the users.xml and authorizations.xml files If above does not help, please start a new community question with the details of your issue. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎07-15-2025

@HeniNechi Try configuring an absolute path for the location of both the users.xml (generated by file-user-group-provider) and authorizations.xml (generated by the file-access-policy-provider) files instead of using "../data/". Make sure that the absolute path is accessible by yoru NiFi service user and that That service provide has permission to allow it to create the users.xml and authorizations.xml files in that directory path. NiFi will fail to start if the authorizers.xml fails to successfully execute which is what you are encountering. I suspect a permission issue or similar that is preventing NiFi from being able to create the necessary files in the defined directory path. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎07-14-2025

@HeniNechi I am not following the relationship between what you have added and the original question in this thread. Authentication and Authorization are two separate processes. Only after successful authentication is the resulting user/client identity passed to the NiFi authorizer that will authorize that user/client identity based on the defined NiFi authorization policies. You are correct that the default authorizers.xml setup uses the managed authorizer that maps user/client identities from the users.xml file with authorization stored in the authorizations.xml file. The file-user-group-provider should be generating the users.xml file if it does not already exist during NiFi startup. If you are having an issue, I recommend starting a new community question with the details of your issue. Thank you, Matt Matt

Online	Offline
Last Visited	‎11-05-2025 03:19 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎11-05-2025 03:19 AM
Posts	3,385
Kudos received	1612

Cloudera Community

Re: using nifi as a kafka streaming- real-time str...

Re: using nifi as a kafka streaming- real-time str...

Re: Nifi Registry and LDAP

Re: NiFi logs not rolling over on Windows

Re: Nifi Registry and LDAP

Re: NiFi Authentication

Re: Can we capture the run details of processor an...

Re: NiFi UI load balance with HAProxy to CONNECTED...

Re: NiFi node offloading and content storage

Re: Frequent Node Disconnects and Flow Synchroniza...

Re: NiFi node offloading and content storage

Re: Securing a NiFi 2.0 cluster

Re: Getting [java.io.FileNotFoundException: ../da...

Re: Getting [java.io.FileNotFoundException: ../da...

Re: [Secure NIFI] Provide custom-users.xml and cus...