About MattWho

MattWho · ‎06-29-2022

@Tryfan You mention this file comes in daily. You also mention that this file arrives through a load-balancer so you don't know which node will receive it. This means you can't configure your source processor for "primary node" only execution as you have done in your shared sample flow with the ListFile. As Primary Node only, the elected primary node will be the only node that executes that processor. So if the source file lands on any other node, it would not get listed. You could handle this flow in the following manor: GetFile ---> (7 success relationships) PostHTTP or InvokeHTTP (7 of these with one configured for each node in your cluster cluster) ListenHTTP --> UpdateAttribute --> PutFile So in this flow, no matter which node receives your source file, the GetFile will consume it. It will then get cloned 6 times (7 copies then exist) with one copy of the FlowFile getting routed to 7 unique PostHttp processors. Each of these targets the ListenHTTP processor listening on each node in your cluster. That ListenHTTP processor will receive all 7 copies (one copy per node) of the original source file. Then use the UpdateAttribute to set your username and location info before the putFile which place each copy in the desired location on the source node. If you add or remove nodes from your cluster, you would need to modify this flow accordingly which is a major downside to such a design. Thus the best solution is still one where the source file is placed somewhere all nodes can retrieve it from so it scales automatically. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎06-28-2022

@VinceSailor Check your nifi.properties file for an identity mapping pattern that contains a Java regex that matches on your DNs. If one does match, the corresponding value is returned and passed to authorizer. so it might be possible your authorizer is only getting: nif1-adm.mydomain.com instead of: CN=nif1-adm.mydomain.com, OU=NIFI Thus resulting in your untrusted proxy exception. That untrusted proxy error should include the exact identity string the authorizer was passed. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎06-21-2022

@Drozu That was the exact exception I want you to find. NiFi writes the content for each of your FlowFiles to the content repository in to content claims. These content claims can contain the content for 1 too many FlowFiles. Once a content claims has zero FlowFiles referencing that claim anymore it gets moved to an "archive" sub-directory. A NiFi background thread then deletes those claims based on archive retention settings in the nifi.properties file. https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#content-repository To help prevent NiFi from filling the disk to 100% because of old archived claims, NiFi will stop all processors from being able to write to the content repository. So when you see this log message, NiFi is applying backpressure and blocking processors that write new content (like MergeContent) until archive cleanup free space or no archived claims exist anymore. As far as stopping and starting the processor goes, it think that action simply gave enough time for archive cleanup to get below the threshold. Then new content written by this processor along with other pushing threshold above the backpressure limit again. The below property controls this upper limit: nifi.content.repository.archive.backpressure.percentage If not set it defaults to 2% higher then what is set in this property: nifi.content.repository.archive.max.usage.percentage The above property defaults to "50%", which you stated is your current disk usage. So i think you are sitting right at that threshold and keep blocking and unblocking. Things you should do: 1. Make sure you do not leave FlowFiles lingering in your dataflow queues. Since a content claim contains the content for 1 too many FlowFiles, all it takes is one small queued FlowFiles to hold up the archival and removal of a much larger claim. <-- most important thing to do here. 2. Change the setting for "nifi.content.repository.archive.backpressure.percentage" to something other than default (for example: 75%). But keep in mind that if your are not properly handling your FlowFiles in your dataflow and allowing them to accumulate in dead end flows or connections to stopped processors, you are just pushing your issue down the road. You will hit 75% and potentially have sam issue. If you don't care about archiving of content, turn is off: nifi.content.repository.archive.enabled=false There were also some bugs in 1.13 related to proper archive claimant count that were addressed in later releases. https://issues.apache.org/jira/browse/NIFI-9993 https://issues.apache.org/jira/browse/NIFI-10023 So I also recommend upgrading as well to 1.16.2 or newer. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎06-21-2022

@arshahin NiFi is a pure java application. What version of Java is your NiFi using (NiFi does not include Java JDK, so it must be installed on the host). NiFi will also not start of Java is not installed, so your host has java installed somewhere. NiFi supports Java JDK 8 and Java JDK 11 (in newer releases), and Java JDK 17 (in latest 1.16 releases). So perhaps your issue is related to the Java version you are using in conjunction with the NiFi version? Looking closely at you stack trace you see the snappy library issues is related to the PutHiveStreaming processor: at org.apache.nifi.processors.hive.PutHiveStreaming https://issues.apache.org/jira/browse/NIFI-9282 https://issues.apache.org/jira/browse/NIFI-9527 Try upgrading to Apache NiFi 1.16 latest. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎06-21-2022

@Drozu - How does disk usage for your NiFi repositories look? - Any ERRORS or WARN logging related writing content? - What do you have set for your "Max Timer Driven Thread cCount" in controller settings (how many cores does the host where you NiFi is running have?)? - Is this a standalone NiFi or a NiFi multi-node cluster (what is the distribution of the FlowFiles on the inbound connections?)? - How has the MergeContent processor been configured? Thanks, Matt

MattWho · ‎06-21-2022

@ThongPham Sounds like you may making a lot of unnecessary rest-api calls that could impact your NiFi's overall performance. Have you maybe looked at using the SiteToSiteBulletingReportingTask? This reporting task will send a FlowFile to a remote input port upon execution of bulletin(s) are produced. That Remote Input Port could then be built into a dataflow that makes notifications via putEmail. So instead of constantly calling the rest-api to see if something happened in the last 5 minutes, the flow will simply send something out when it happens only. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎06-17-2022

@ThongPham There is no such thing as a permanent Bearer token. How long a Bearer token stays valid is set in the provider that issuing that bearer token. In you case the ldap-provider. Also keep in mind that a bearer token is issued by a specific node in the NiFi cluster and can not be used to authenticate with every node in the NiFi cluster. Since a secured NiFi will always attempt mutual TLS authentication first. I suggest you instead you generate and use a client certificate to interact with the NiFi API. Mutual TLS based authentication does not use bearer tokens and the authentication will be successful until that client certificate expires which is configurable when generating the certificate. But generally speaking certificates are often valid for 12 or months. Since there is no bearer token, a client certificate can be used with any node in the cluster. Your other option is to build a flow within your NiFi to get a new bearer token automatically and store that token in maybe a distributedMapCache. Then in your other flow you fetch that bearer token before calling the rest-api endpoint. A failure should loop back to the FetchDistrubutedMapCache just in case you have a scenario where the bearer token expires between fetch and call. Out of curiosity, what rest-api endpoint are you calling every 20 seconds and why? If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎06-16-2022

@yagoaparecidoti Authentication is one piece of being able to access NiFi's UI. While the file-user-group-provider and file-access-policy-provider facilitate the automatic creation of the initial admin user identity and setting of Admin needed policies for that user, It is the responsibility of that admin to add additional users to NiFi and add that user to authorization policies. The initial admin user would accomplish this directly form within the NiFi UI. Global menu in upper right corner --> users (add additional user identities here for which you want to setup authorizations) Global menu in upper right corner --> Policies (add users you have added to select NiFi controller policies) From canvas --> operate panel on left side --> key icon to add policies for specific components added to canvas https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#config-users-access-policies Thank you, Matt

MattWho · ‎06-15-2022

@Techie123 When you say "Cooperates Linux Machine", are you referring to http://www.colinux.org/ installation? If so, this does not sound like an ideal place to try and run a NiFi instance or NiFi cluster as it appears to isolate that linux to a single core on the host hardware. NiFi can be a resource intensive application, especially when setup as a cluster. When NiFi is started, you can tail the nifi-app.log to observe the complete startup process. NiFi has completed its startup once you see the following lines: 2022-06-15 13:52:52,727 INFO org.apache.nifi.web.server.JettyServer: NiFi has started. The UI is available at the following URLs: 2022-06-15 13:52:52,727 INFO org.apache.nifi.web.server.JettyServer: https://nifi-node1:8443/nifi 2022-06-15 13:52:52,730 INFO org.apache.nifi.BootstrapListener: Successfully initiated communication with Bootstrap At this time the NiFi URL should be reachable. If not, I would be making sure a firewall or some issue with your "Cooperates Linux Machine" setup is not blocking access to the port on which the NiFi is bound. The latter error you shared: o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling 'CONNECTION_REQUEST' protocol message due to: javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: Path does not chain with any of the trust anchors implies that your NiFi cluster has been setup to start secured over HTTPS. Secured NiFi nodes must have a configured keystore and truststore. The above error is telling you that node's truststore does not contain the TrustedCertEntry(s) needed to trust the client certificate presented by another node (in this case the elected cluster coordinator node). The NiFi configured keystore must meet these requirements: 1. Contain only ONE PrivateKeyEntry. (this private key is typically unique per node) 2. That Private key must support both ClientAuth and ServerAuth EKUs 3. That PrivateKey must contain a SAN entry for the NiFI host on which it is being used. That SAN must mach the hostname of the host. The NiFi configured truststore must contain the complete trust chain for the nodes private keys. You can use the keytool command to inspect the contents of either the keystore or truststore and verify above requirements are satisfied, keytool -v -list -keystore <keystore or truststore> A certificate (private "PrivateKeyEntry" or public "TrustedCertEntry") will have an owner an issuer. The issuer is the authority for that key. A self-signed certificate will have the same owner and issuer. The truststore needs to have a TrustedCertEntry for each public certificate in the trusts chain. For example: Assume you have a PrivateKey in your keystore with: owner: cn=nifi-node1, ou=nifi Issuer: cn=intermediate-ca, ou=trust Your Truststore would then need to have a TrustedCertEntry for the public key for: owner: cn=intermediate-ca, ou=trust issuer: cn=root-ca, ou=trust and: owner: cn=root-ca, ou=trust issuer: cn=root-ca, ou=trust You know you have the complete trust chain once you reach the public cert where owner and issuer have the same value. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎06-15-2022

@Vapper Maybe I am not 100% clear on your question here.. When you say new "flow file" I assume you are talking about the flow.xml.gz file? In NiFi, a "FlowFile" is what you see being queued on connections between processor components and consists of FlowFile content and FlowFile Metadata/Attributes. In a NiFi cluster, all nodes must be running the exact same flow.xml.gz. If the dataflows loaded from the flow.xml.gz do not match the node will not be allowed to join the cluster. In the latest releases of NiFi, Nodes joining a cluster will inherit the cluster dataflow if the local dataflow does not match. Once a cluster is established, a cluster dataflow is elected. That becomes the clusters dataflow. Joining a node to that established cluster will require that joining nodes is using same flow.xm.gz as existing cluster and if not inherit the cluster elected dataflow over the existing flow on the joining node. NiFi will not assume that a joining nodes flow.xml.gz is "newer" or "desired" over the elected cluster dataflow and replace the elected cluster dataflow with that new dataflow. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

Online	Offline
Last Visited	‎12-26-2025 02:55 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎12-26-2025 02:55 PM
Posts	3,406
Kudos received	1619

Cloudera Community

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Invoking Nifi rest api in Data Flow

Re: How to replicate a file in NiFi

Re: Nifi SSL - Insufficient Permissions : Untruste...

Re: Problem with Merge Content Processor after swi...

Re: An unexpected error has occurred. Please check...

Re: Problem with Merge Content Processor after swi...

Re: Call REST API to access a secured NiFi cluster

Re: Call REST API to access a secured NiFi cluster

Re: NIFI - Unknown user with identity 'user'. Cont...

Re: NiFi Installation on co-operate Linux Server g...

Re: nifi cluster rolling deploy