About MattWho

MattWho · ‎06-13-2024

@SAMSAL Thank you for the kind words. Likewise, the community thrives through members like yourself. Thank you for all your amazing contributions.

MattWho · ‎06-13-2024

@tcherian NiFi certificates must meet the following criteria: 1. No wildcards used in the subject DistinquishedName (DN) 2. Included both clientAuth and serverAuth in the ExtendedKeyUsage (EKU) 3. Contains one or more SubjectAlternativeName (SAN) entries. 4. Keystore can only contain 1 PrivateKey entry There are many resources on the web for generating your own self-signed certificates and adding them to a PKCS12 or JKS keystore. The "Keystore" and "truststore" are both just keystores. The NiFi "Keystore" contain the PrivateKey entry which Is used by NiFi to identify itself as the server (serverAuth) when connecting to it and as the client (clientAuth) when connecting outward as a client (such as talking to other NiFi's, NiFi-Registry, etc). The NiFi "truststore" contains one too many TrustedCert entries. It is common to use the default Java cacerts file (which is just a jks keystore) and add additional TrustedCert entries to it. The trustedCerts are the public certs that correspond to the PrivateKey that you should never share. The Trusted certs are the signers of the private keys. There are intermediate and root trusted cert keys. An intermediate trust is one where the owner and signer are not the same DN. A root trust is one where the owner and signer are the same DN. So you might create a PrivateKey that is signed by intermediate Certificate Authority (CA) and that intermediate CA would be signed by another intermediate CA or a root (CA). The chain of signers between intermediate and root is known as the trustchain. The Truststore needs to contain complete trust chains for your PrivateKey. There are even free services out there like Tinycert, but you can also use openssl and keystool to generate self-signed certificates and import them to a keystore. Just google how to create a certificate and how to import certificate into a keystore. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎06-11-2024

Yes, i believe this to be a legitimate NiFi bug.

MattWho · ‎06-11-2024

@tcherian I assume you are using the non-production ready NiFi out-of-the-box auto-generated keystore and truststore keystores files? If so, you should generate your own certificates that include the additional "host.docker.internal" and/or "nifi-container-name" SAN entries. Import that certificate into your own keystore and populated a truststore with the complete trust chain for your certificate. Something else you might want to try is to populate the the following property in the nifi.properties file: nifi.web.proxy.host=host.docker.internal,nifi-container-name But even if above works for you, i would still highly encourage you to get actual signed certificates instead. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎06-11-2024

@ranie I see a couple issues with your NiFi Expression Language (NEL) statement: I see some formatting issues in your java simple formatter string: 'yyyy-MM-dd\'T\'00:00:00\'Z\'. Your single and double quotes are not balanced. You are using the function "format ()" to change the timezone, but you could also use the "formatInstant()" function. You are missing the "toNumber()" function to convert the date string to a number before trying to apply a mathematically computation to it. The Now() function will return the date current system time as the NiFi service sees it. example: my NiFi server uses UTC timezone: The toNumber() function will provide the current date and time as a number of milliseconds since midnight Jan 1st, 1970 GMT. This number will always be a GMT value. The formatInstant() function will allow you to take a GMT time or a Java formatted date string and reformat it for a different timezone. Taking above feedback into consideration, the following NEL statement should work for you. ${now():toNumber():minus(86400000):formatInstant("yyyy-MM-dd'T'HH:mm:ss 'Z'", "CET")} Pay close attention to your use of single and double quotes. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎06-10-2024

@omeraran This use case sounds like a dataflow that would consist of the following: GenerateTableFetch --> ExecuteSQL --> <any processor you may want to modify, extract, etc content if needed) --> PutDatabaseRecord GenerateTableFetch will ingest rows from your source MySQL DB and maintain use NiFi state to record the maximum values for records so that it can continue to check for any ingest additional rows added. It generates FlowFiles that contain the SQL queries needed by the ExecuteSQL to fetch the rows. I don't know if your use case requires any manipulation, routing modifying, etc, but if so you would do that next. And finally use the PutDatabaseRecord to write your rows to the Oracle DB. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎06-10-2024

@udayAle Some NiFi Processors process FlowFiles one at a time and other may process batches of FlowFiles in a single thread execution. Then there are processors like the MergeContent and MergeRecord that allocate FlowFiles to bins and then only merges that bin once the min criteria is met to merge. With non merge type processors, a FlowFile that becomes results in a hung thread or long thread execution would block processing of FlowFiles next in queue. For Merge type processors, depending on data volumes and configuration 5 mins might be expected behavior (of your you could set a max bin age of 5 mins to force a bin to merge even if mins have not been satisfied). So i think there are two approaches to look at here. One monitors long running threads and the the other looks as failures. Runtime Monitoring Properties: When configured this background process checks for long running threads and produces log output and NiFi Bulletins when a thread exceeds a threshold. You could build an alerting dataflow around this using the SiteToSiteBulletinReportingTask, some Routing processors(to filter specific types of bulletins related to long running tasks) and then an email processor. The majority of processors that have potential for failures to occur will have a failure relationship. You can build a dataflow using that failure relationship to alert on those failures. Consider a failure relationship routed to an update attribute that use the advanced UI to increment a failure counter that then feeds a routeOnAttribute processor that handles routing base on number of failed attempts. After x number of failures it could send an email via putEmail. Apache NiFi does not have a background "Queued Duration" monitoring capability. Programmatically building one would be expensive resource wise. As you would need to monitor every single constantly changing connection and parse out and FlowFile with a "Queued Duration" in excess of X amount of time. Consider a Processor that is hung, the connection would continue to grow until backpressure kicks in and forces upstream processor to start queueing. You could end up with 10,000 FlowFiles alerting on queued duration. Hopefully this helps you maybe to look at the use case a little differently. Keep in mind that all monitoring including examples I provided will have impact on performance. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎06-07-2024

@mohammed_najb It is impossible to guarantee a flow will always run error free. You need to plan and design for handling failure. How are you handling the "failure" relationships on your ExecuteSQL and putHDFS processors? The PutHDFS will either be successful or route FlowFile to failure relationship r rollback the session. NiFi does not auto remove FlowFiles. It is responsibility of dataflow designr to handle failures to avoid dataloss. For example, do not auto-terminate any component relationships where FlowFile may get routed. I don't know what would be the "best practice" as that comes with testing. Since you are using GenerateTableFetch processor, it creates attributes on the output FlowFiles. One of which is "fragment.count". You could potentially use this to track that all records are written to HDFS successfully. Look at UpdateAttributes stateful usage options. This would allow you to setup RouteOnAttribute to route last FlowFile once stateful count equals "fragement.count" to a processor that triggers your Spark job. Just a suggestion, but others in the community may have other flow design options. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎06-07-2024

@scoutjohn I don't have a Kubernetes env to mess around with currently. But a couple things i see from your response: Your urls appear to be missing the /nifi on the end. What value is set for "nifi.web.http.host" in the nifi.properties on each instance of your K8s cluster? Is "nifi-0.nifi-headless.namespace.svc.cluster.local" being used in S2SProvenanceReporting task resolvable on the NiFi host to a valid IP address that is reachable between nodes? Are port available and unused on both hosts? Configuration match on both hosts in nifi.properties (with exception of host specific properties)? PrivateKey certificates used by hosts contain proper EKUs and SAN entries needed? Thank you, Matt

MattWho · ‎06-06-2024

@G_B NiFi cluster deployments expect that all nodes in the cluster have same hardware specifications. There is no option in NiFi's Load Balanced connections to customize load-balancing based on current CPU load average of some other node. Even doing so would require NiFi nodes to continuously ping all other nodes to get the current load average before sending FlowFiles which would impact performance. The only thing that would result in any form of variation in distribution would be a node receive rate being diminished, but that is out of NiFi's control. Round Robin will skip a node in rotation if the node is unable to receive FlowFiles as fast as another node. Also keep in mind that a NiFi Cluster elects a node the roles "cluster coordinator" and "primary node". Sometimes both roles get assigned to same node. The assignment of these roles can change at. anytime. The primary node is only node that will schedule "primary node" only processors to execute. So your one node lighter on CPU could also end up assigned this role adding to its CPU load average. Often CPU load average is not only impacted by volume, but also content size of the FlowFiles. The LB connections also do not take in to account FlowFile content size when distributing FlowFiles. While your best option here performance wise is to make sure all nodes have same hardware specifications, there are a few less performant options you could try to distribute your data differently. 1. Use Remote Process Group (RPG) which uses Site-To-SIte (S2S) to distribute FlowFiles across your NiFi nodes. Always recommend using RPG to push to a Remote Input port rather then pull from an Remote output port to achieve better load distribution. Issue here is you need to add RPGs and Remote ports everywhere you were previously using LB configured connections. 2. Build a smart data distribution reusable dataflow. You could build a data flow that sorts FlowFiles by their content size ranges, merges bundles via mergeContent using FlowFile Stream, v3 merge format, send bundles based on size ranges to your various nodes via invokeHTTP to listenHTTP, and then unpackContent once received to extract the FlowFile bundle. This mergeContent is going to add addition cpu load. 3. Consider using DistributeLoad (can be configured with weighted distribution allowing you to create three distribution relationships with maybe like 5 FlowFile per relationship 1 and 2, and relationship with only 1 per iteration. This allows you to send 1 to you lower core node for every 5 sent to other two nodes. You would still need to use updateAttribute (set custom target node URL), mergeContent, invokeHttp, ListenHTTP, and unpackContent in this flow. So if addressing your hardware differences is not option, Number 1 is probably your next best choice. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

Online	Offline
Last Visited	‎01-14-2026 12:41 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎01-14-2026 12:41 PM
Posts	3,421
Kudos received	1620

Cloudera Community

Re: Best Practice for configuring registry flows

Re: Nifi 2.7.2 Start Problem

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: Apache Nifi Release 2.0 M1 & M2 High CPU Utili...

Re: How to access Nifi REST API 2.0.0 from a docke...

Re: Nifi: Flowfile stuck in front of a processor g...

Re: How to access Nifi REST API 2.0.0 from a docke...

Re: Date with now() in a specific format in a spec...

Re: Need Help About Apache NiFi

Re: Apache Nifi Queue monitoring and Alerting

Re: How to determine if a ExecuteSQL has ingested ...

Re: Fetch Provenance data using SiteToSiteProvenan...

Re: Load balancing in NiFi - Heterogenous Nodes in...