Member since
07-30-2019
3470
Posts
1641
Kudos Received
1018
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 244 | 05-06-2026 09:16 AM | |
| 433 | 05-04-2026 05:20 AM | |
| 314 | 05-01-2026 10:15 AM | |
| 505 | 03-23-2026 05:44 AM | |
| 385 | 02-18-2026 09:59 AM |
06-07-2024
08:01 AM
1 Kudo
@mohammed_najb It is impossible to guarantee a flow will always run error free. You need to plan and design for handling failure. How are you handling the "failure" relationships on your ExecuteSQL and putHDFS processors? The PutHDFS will either be successful or route FlowFile to failure relationship r rollback the session. NiFi does not auto remove FlowFiles. It is responsibility of dataflow designr to handle failures to avoid dataloss. For example, do not auto-terminate any component relationships where FlowFile may get routed. I don't know what would be the "best practice" as that comes with testing. Since you are using GenerateTableFetch processor, it creates attributes on the output FlowFiles. One of which is "fragment.count". You could potentially use this to track that all records are written to HDFS successfully. Look at UpdateAttributes stateful usage options. This would allow you to setup RouteOnAttribute to route last FlowFile once stateful count equals "fragement.count" to a processor that triggers your Spark job. Just a suggestion, but others in the community may have other flow design options. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
06-07-2024
07:39 AM
1 Kudo
@scoutjohn I don't have a Kubernetes env to mess around with currently. But a couple things i see from your response: Your urls appear to be missing the /nifi on the end. What value is set for "nifi.web.http.host" in the nifi.properties on each instance of your K8s cluster? Is "nifi-0.nifi-headless.namespace.svc.cluster.local" being used in S2SProvenanceReporting task resolvable on the NiFi host to a valid IP address that is reachable between nodes? Are port available and unused on both hosts? Configuration match on both hosts in nifi.properties (with exception of host specific properties)? PrivateKey certificates used by hosts contain proper EKUs and SAN entries needed? Thank you, Matt
... View more
06-06-2024
08:02 AM
@G_B NiFi cluster deployments expect that all nodes in the cluster have same hardware specifications. There is no option in NiFi's Load Balanced connections to customize load-balancing based on current CPU load average of some other node. Even doing so would require NiFi nodes to continuously ping all other nodes to get the current load average before sending FlowFiles which would impact performance. The only thing that would result in any form of variation in distribution would be a node receive rate being diminished, but that is out of NiFi's control. Round Robin will skip a node in rotation if the node is unable to receive FlowFiles as fast as another node. Also keep in mind that a NiFi Cluster elects a node the roles "cluster coordinator" and "primary node". Sometimes both roles get assigned to same node. The assignment of these roles can change at. anytime. The primary node is only node that will schedule "primary node" only processors to execute. So your one node lighter on CPU could also end up assigned this role adding to its CPU load average. Often CPU load average is not only impacted by volume, but also content size of the FlowFiles. The LB connections also do not take in to account FlowFile content size when distributing FlowFiles. While your best option here performance wise is to make sure all nodes have same hardware specifications, there are a few less performant options you could try to distribute your data differently. 1. Use Remote Process Group (RPG) which uses Site-To-SIte (S2S) to distribute FlowFiles across your NiFi nodes. Always recommend using RPG to push to a Remote Input port rather then pull from an Remote output port to achieve better load distribution. Issue here is you need to add RPGs and Remote ports everywhere you were previously using LB configured connections. 2. Build a smart data distribution reusable dataflow. You could build a data flow that sorts FlowFiles by their content size ranges, merges bundles via mergeContent using FlowFile Stream, v3 merge format, send bundles based on size ranges to your various nodes via invokeHTTP to listenHTTP, and then unpackContent once received to extract the FlowFile bundle. This mergeContent is going to add addition cpu load. 3. Consider using DistributeLoad (can be configured with weighted distribution allowing you to create three distribution relationships with maybe like 5 FlowFile per relationship 1 and 2, and relationship with only 1 per iteration. This allows you to send 1 to you lower core node for every 5 sent to other two nodes. You would still need to use updateAttribute (set custom target node URL), mergeContent, invokeHttp, ListenHTTP, and unpackContent in this flow. So if addressing your hardware differences is not option, Number 1 is probably your next best choice. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
06-06-2024
07:12 AM
@alan18080 NiFi-Registry only pushes to the GitFlowPersistenceProvider while running. NIFi-Registry will only read from Git on startup. The GitFlowPersistence Provider also only contains the flow definitions for the version controlled process groups. Each NiFi-Registry has a metadata database maintains the knowledge of which buckets exist, which versioned items belong to which buckets, as well as the version history for each item. So if you are trying to share a single Git Repo across multiple running NiFi-Registry instances this will explain why you are seeing missing versions at times across your multiple instances. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
06-04-2024
10:02 AM
1 Kudo
@yuanhao1999 I see that you raised an Apache Jira for this same issue. https://issues.apache.org/jira/browse/NIFI-13340 and that your issue is likely related to: https://issues.apache.org/jira/browse/NIFI-13281 When you delete and re-create import the Process Group from NiFi-Registry, all your components will get new random UUIDs assigned to them. That effectively eliminates the stuck condition. Where changes being made to process group configuration while FlowFile(s) were still queued in a connection within the Process Group? Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
06-04-2024
09:54 AM
@inkerinmaa Out of the box Apache NiFi is configured to be secure. and Most browser do not support HTTP anymore and force redirect to HTTPS. NiFi is going to come up in secured if you have the HTTPS port property configured in the nifi.properties file. So you would need to unset that property for NiFi to start unsecure. Thanks, Matt
... View more
06-04-2024
09:03 AM
@Alexy Are you specifically needing to produce so much logging? What loggers do you have added to your logback.xml? How many are set to "INFO" level logging? If you only want to log exceptions, you could change the "INFO" to "WARN" or "ERROR" to greatly reduce mount of INOF logging being produced. As far as NiFi performance goes, it is all about managing CPU Load average and Disk I/O (Specifically disk I/O of the disks where NiFi's content, flowfile, and provenance repositories are located). You could make sure your logs are being written a separate disk to elevate that Disk I/) form impacting NiFi's repos disks. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
06-03-2024
11:09 AM
@inkerinmaa An Apache NiFi multi-node clustered setup is much different then a standalone NiFi installation. Your exception is related to a TLS exchange trust issue going on between your nodes. In a NiFi cluster one of the nodes will be elected to the role of "cluster coordinator" by Zookeeper (ZK). All of the nodes will communicate with ZK in order to learn which node is currently assigned to this role and then begin sending heartbeats to that elected node in order to join the cluster. It looks like you are just allowing your NiFi nodes to auto generate their own self-signed certificates on each node? Works fine to do this in a standalone NiFi setup; however, you'll need to create keystores and truststores for your NiFi cluster nodes so that proper mutual trust can be established. I also see that your are using the Single-User login provider and authorizer. For a NiFi cluster you'll also want to be using more production ready providers like the ldap-provider for login and the StandardManagedAuthorizer for all your authorizations. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
05-30-2024
01:27 PM
@scoutjohn I installed an out-of-the-box Apache NiFi 1.26 using single user providers and the NiFi self-signed generated certificates. I was able to send provenance events via the S2SProvenanceReportingTask successfully back to a Remote Input Port on the same NiFi with no issues. So authorization is not an issue here. I tested using both HTTP and RAW transport protocols successfully. I also validated that S2S was working by setting up a Remote Process Group to send FlowFiles to a Remote Input port as well. Here is the dataflow I setup: You can see in the above that i generated some FlowFiles that were sent over S2S to the "Input1" remote port. You can also see that my "prov" port received provenance events from the S2SProvenanceReportingTask. My S2S setting from nifi.properties file: # Site to Site properties
nifi.remote.input.host=localhost
nifi.remote.input.secure=true
nifi.remote.input.socket.port=10001
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec
nifi.remote.contents.cache.expiration=30 secs My Remote Process Group configuration: Switching to "HTTP" transport protocol also worked. S2SProvenanceReportingTask configuration: While all of this worked correctly, sending provenance events via the S2SProvenanceReportingTask back to the same NiFi is not advisable. It creates an endless loop of provenance events. For every FlowFile received on the "prov" port another provenance "RECEIVE" event is created which then gets set by the reporting task. This an infinite loop is created. You would certainly have difficulty related to authentication and authorization sending to another NiFi instance using the out-of-the-box keystore, truststore, and single user providers between two out of the box NiFi deployments. But for testing purposes this works. Now I see from your configuration you setup: nifi.remote.input.host=cd8e8c899db6 Makes me wonder if that given hostname is: A SAN entry in the NiFi generated keystore certificate. You could use keytool command to check. keytool -v -list -keystore keystore.p12 That hostname is resolvable and reachable by your NiFi instance. Try changing that property to "localhost" see if it resolves your issue. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
05-30-2024
10:00 AM
@hegdemahendra The small number in upper right corner of any processor shows the number of active threads at time the UI was last refreshed. The default auto refresh of the UI is every 30 seconds. It turns red when their is an active terminated thread. So with your example above 2(1), it is telling you that this processor as 2 active threads and 1 terminated thread. A terminated thread is the result of manual user intervention. When a processor asked to change run-status from "running" to "stopped", (Stopping Component) it first transition into a state of "stopping". It does not transition to "stopped" until all active threads complete. NiFi provides and option to "terminate" when in a stopping state because of active threads. Terminate (Terminating a components tasks) does not kill that active thread since all thread belong to a single JVM. What the terminate function does is release any FlowFile tied to the active thread(s) back to their originating connection and marks the thread as terminated. That terminated thread will continue to execute until it completes or the JVM is restarted. Should that now "terminated" thread complete, all output is sent to dev null instead of resulting in any down stream movement. This allows users to handle scenarios where there are long running threads or hung threads preventing the stopping, changing of configuration, and starting of a processor. When a terminated processor is restarted it will re-process the FlowFile(s) that were originally tied to the terminated thread(s). This prevents any data loss from occurring. If a terminated thread is in a permanently hung state, the only way to get rid of it completely is a restart of NiFi which will kill the JVM after a graceful shutdown period. As far as your custom processor getting stuck, you would need to collect thread dumps and inspect those to see what your thread is waiting on that is blocking it from progressing and address that in your custom code. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more