About MattWho

MattWho · ‎06-11-2024

@ranie I see a couple issues with your NiFi Expression Language (NEL) statement: I see some formatting issues in your java simple formatter string: 'yyyy-MM-dd\'T\'00:00:00\'Z\'. Your single and double quotes are not balanced. You are using the function "format ()" to change the timezone, but you could also use the "formatInstant()" function. You are missing the "toNumber()" function to convert the date string to a number before trying to apply a mathematically computation to it. The Now() function will return the date current system time as the NiFi service sees it. example: my NiFi server uses UTC timezone: The toNumber() function will provide the current date and time as a number of milliseconds since midnight Jan 1st, 1970 GMT. This number will always be a GMT value. The formatInstant() function will allow you to take a GMT time or a Java formatted date string and reformat it for a different timezone. Taking above feedback into consideration, the following NEL statement should work for you. ${now():toNumber():minus(86400000):formatInstant("yyyy-MM-dd'T'HH:mm:ss 'Z'", "CET")} Pay close attention to your use of single and double quotes. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎06-10-2024

@udayAle Some NiFi Processors process FlowFiles one at a time and other may process batches of FlowFiles in a single thread execution. Then there are processors like the MergeContent and MergeRecord that allocate FlowFiles to bins and then only merges that bin once the min criteria is met to merge. With non merge type processors, a FlowFile that becomes results in a hung thread or long thread execution would block processing of FlowFiles next in queue. For Merge type processors, depending on data volumes and configuration 5 mins might be expected behavior (of your you could set a max bin age of 5 mins to force a bin to merge even if mins have not been satisfied). So i think there are two approaches to look at here. One monitors long running threads and the the other looks as failures. Runtime Monitoring Properties: When configured this background process checks for long running threads and produces log output and NiFi Bulletins when a thread exceeds a threshold. You could build an alerting dataflow around this using the SiteToSiteBulletinReportingTask, some Routing processors(to filter specific types of bulletins related to long running tasks) and then an email processor. The majority of processors that have potential for failures to occur will have a failure relationship. You can build a dataflow using that failure relationship to alert on those failures. Consider a failure relationship routed to an update attribute that use the advanced UI to increment a failure counter that then feeds a routeOnAttribute processor that handles routing base on number of failed attempts. After x number of failures it could send an email via putEmail. Apache NiFi does not have a background "Queued Duration" monitoring capability. Programmatically building one would be expensive resource wise. As you would need to monitor every single constantly changing connection and parse out and FlowFile with a "Queued Duration" in excess of X amount of time. Consider a Processor that is hung, the connection would continue to grow until backpressure kicks in and forces upstream processor to start queueing. You could end up with 10,000 FlowFiles alerting on queued duration. Hopefully this helps you maybe to look at the use case a little differently. Keep in mind that all monitoring including examples I provided will have impact on performance. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎06-07-2024

@mohammed_najb It is impossible to guarantee a flow will always run error free. You need to plan and design for handling failure. How are you handling the "failure" relationships on your ExecuteSQL and putHDFS processors? The PutHDFS will either be successful or route FlowFile to failure relationship r rollback the session. NiFi does not auto remove FlowFiles. It is responsibility of dataflow designr to handle failures to avoid dataloss. For example, do not auto-terminate any component relationships where FlowFile may get routed. I don't know what would be the "best practice" as that comes with testing. Since you are using GenerateTableFetch processor, it creates attributes on the output FlowFiles. One of which is "fragment.count". You could potentially use this to track that all records are written to HDFS successfully. Look at UpdateAttributes stateful usage options. This would allow you to setup RouteOnAttribute to route last FlowFile once stateful count equals "fragement.count" to a processor that triggers your Spark job. Just a suggestion, but others in the community may have other flow design options. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎06-06-2024

@G_B NiFi cluster deployments expect that all nodes in the cluster have same hardware specifications. There is no option in NiFi's Load Balanced connections to customize load-balancing based on current CPU load average of some other node. Even doing so would require NiFi nodes to continuously ping all other nodes to get the current load average before sending FlowFiles which would impact performance. The only thing that would result in any form of variation in distribution would be a node receive rate being diminished, but that is out of NiFi's control. Round Robin will skip a node in rotation if the node is unable to receive FlowFiles as fast as another node. Also keep in mind that a NiFi Cluster elects a node the roles "cluster coordinator" and "primary node". Sometimes both roles get assigned to same node. The assignment of these roles can change at. anytime. The primary node is only node that will schedule "primary node" only processors to execute. So your one node lighter on CPU could also end up assigned this role adding to its CPU load average. Often CPU load average is not only impacted by volume, but also content size of the FlowFiles. The LB connections also do not take in to account FlowFile content size when distributing FlowFiles. While your best option here performance wise is to make sure all nodes have same hardware specifications, there are a few less performant options you could try to distribute your data differently. 1. Use Remote Process Group (RPG) which uses Site-To-SIte (S2S) to distribute FlowFiles across your NiFi nodes. Always recommend using RPG to push to a Remote Input port rather then pull from an Remote output port to achieve better load distribution. Issue here is you need to add RPGs and Remote ports everywhere you were previously using LB configured connections. 2. Build a smart data distribution reusable dataflow. You could build a data flow that sorts FlowFiles by their content size ranges, merges bundles via mergeContent using FlowFile Stream, v3 merge format, send bundles based on size ranges to your various nodes via invokeHTTP to listenHTTP, and then unpackContent once received to extract the FlowFile bundle. This mergeContent is going to add addition cpu load. 3. Consider using DistributeLoad (can be configured with weighted distribution allowing you to create three distribution relationships with maybe like 5 FlowFile per relationship 1 and 2, and relationship with only 1 per iteration. This allows you to send 1 to you lower core node for every 5 sent to other two nodes. You would still need to use updateAttribute (set custom target node URL), mergeContent, invokeHttp, ListenHTTP, and unpackContent in this flow. So if addressing your hardware differences is not option, Number 1 is probably your next best choice. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎06-06-2024

@alan18080 NiFi-Registry only pushes to the GitFlowPersistenceProvider while running. NIFi-Registry will only read from Git on startup. The GitFlowPersistence Provider also only contains the flow definitions for the version controlled process groups. Each NiFi-Registry has a metadata database maintains the knowledge of which buckets exist, which versioned items belong to which buckets, as well as the version history for each item. So if you are trying to share a single Git Repo across multiple running NiFi-Registry instances this will explain why you are seeing missing versions at times across your multiple instances. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎06-04-2024

@inkerinmaa Out of the box Apache NiFi is configured to be secure. and Most browser do not support HTTP anymore and force redirect to HTTPS. NiFi is going to come up in secured if you have the HTTPS port property configured in the nifi.properties file. So you would need to unset that property for NiFi to start unsecure. Thanks, Matt

VidyaSargur · ‎06-02-2024

@Naveen_Sagar, Did the response assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future.

hegdemahendra · ‎05-30-2024

What an explanation ! Cleared my doubts. Thank you so much @MattWho .

MattWho · ‎05-30-2024

@Vikas-Nifi @ckumar is 100% correct. Only fields explicitly marked as supporting NiFi Expression Language (NEL) can support a NEL expression "${schedule}". I am however curious about your use case as to why you would even being trying to do this. From what you shared you are extracting a cron schedule from the json content of some FlowFiles traversing an EvaluateJsonPath processor. That "schedule" is added on to the NiFi FlowFile as a FlowFile attribute (key=value pair). This would not make that key=value pair accessible to any other NiFi component unless that FlowFile containing the FlowFile attribute was processed by that other component. However, in your shared dataflow you do not mention that EvaluateJsonPath connects to your invokeHTTP processor via an inbound dataflow connection (Keep in mind that even if you did do this, it does not change the fact that the run schedule property does not support NEL). I just wanted to clarify how FlowFile attributes are and can be used. Also keep in mind that the "run schedule" is a scheduler only. The run schedule set on a processor controls when the NiFi controller will schedule the execution of the processors code. It does not mean that they the processor will immediately execute at time of scheduling (It may be delayed on execution waiting for an available execution thread from the thread pool). All scheduled components share a thread pool and NiFi framework will also handle assigning threads to next scheduled component as thread become available. So the NiFi framework needs to know the scheduling for a component when it is started; otherwise, NiFi would never know when to schedule it to execute. Unless a component property has an explicit tool tip that tells you it support NEL, then it does not. For NiFi processor components, you will find that only some processor specific properties within the "PROPERTIES" tab support NEL. This is not only available through property tooltips, but also in the processors documentation. Examples: Even when NEL is supported there is a scope. It may support FlowFile attributes, Variable Registry (going away in NiFi 2.x releases), or both. Thank you, Matt

MattWho · ‎05-29-2024

@Dilipkumar I am not sure what you mean by backups. Backups of what? The NiFi-Registry is used to version control Process Groups from one or more NiFi instances. Those version controlled flow definitions include all configurations (minus any sensitive properties values). A version controlled flow definition can be imported to any NiFi instance or cluster that has authorized access to the NiFi-Registry bucket in which the it is stored. NiFi-Registry can be configured to persist the flow definition storage in a local file persistence or in a git repository. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

Online	Offline
Last Visited	‎11-17-2025 03:08 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎11-17-2025 03:08 AM
Posts	3,390
Kudos received	1614

Cloudera Community

Re: How to achieve inheritence within Parameter Co...

Re: using nifi as a kafka streaming- real-time str...

Re: using nifi as a kafka streaming- real-time str...

Re: Nifi Registry and LDAP

Re: NiFi logs not rolling over on Windows

Re: Date with now() in a specific format in a spec...

Re: Apache Nifi Queue monitoring and Alerting

Re: How to determine if a ExecuteSQL has ingested ...

Re: Load balancing in NiFi - Heterogenous Nodes in...

Re: Nifi-registry multiple instances

Re: NIFI 2.0 Cluster Set Up

Re: Trigger RESTAPI to schedule processors from ni...

Re: What is the red highlighted number of Apache n...

Re: Schedule Invoking HTTP dynamically - Nifi

Re: Is there a way to importbackups in nifiregistr...