Member since
03-18-2023
5
Posts
1
Kudos Received
0
Solutions
06-10-2024
06:51 AM
@udayAle Some NiFi Processors process FlowFiles one at a time and other may process batches of FlowFiles in a single thread execution. Then there are processors like the MergeContent and MergeRecord that allocate FlowFiles to bins and then only merges that bin once the min criteria is met to merge. With non merge type processors, a FlowFile that becomes results in a hung thread or long thread execution would block processing of FlowFiles next in queue. For Merge type processors, depending on data volumes and configuration 5 mins might be expected behavior (of your you could set a max bin age of 5 mins to force a bin to merge even if mins have not been satisfied). So i think there are two approaches to look at here. One monitors long running threads and the the other looks as failures. Runtime Monitoring Properties: When configured this background process checks for long running threads and produces log output and NiFi Bulletins when a thread exceeds a threshold. You could build an alerting dataflow around this using the SiteToSiteBulletinReportingTask, some Routing processors(to filter specific types of bulletins related to long running tasks) and then an email processor. The majority of processors that have potential for failures to occur will have a failure relationship. You can build a dataflow using that failure relationship to alert on those failures. Consider a failure relationship routed to an update attribute that use the advanced UI to increment a failure counter that then feeds a routeOnAttribute processor that handles routing base on number of failed attempts. After x number of failures it could send an email via putEmail. Apache NiFi does not have a background "Queued Duration" monitoring capability. Programmatically building one would be expensive resource wise. As you would need to monitor every single constantly changing connection and parse out and FlowFile with a "Queued Duration" in excess of X amount of time. Consider a Processor that is hung, the connection would continue to grow until backpressure kicks in and forces upstream processor to start queueing. You could end up with 10,000 FlowFiles alerting on queued duration. Hopefully this helps you maybe to look at the use case a little differently. Keep in mind that all monitoring including examples I provided will have impact on performance. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
05-21-2024
10:38 AM
@udayAle Please start a new community question for your unrelated follow-up question above. Responses to an unrelated question will lead to confusion to other community members who may be having similar problems. You can use @<username> to notify specific people about new community questions. Thank you, Matt
... View more
06-28-2023
12:40 PM
@RalphA @udayAle I encourage you to raise your question as a new question in Cloudera Community rather then asking your question as a comment on an existing article. You can certainly. reference this article in your new community question. You'll get better visibility that way to you query. Thank you, Matt
... View more
03-21-2023
11:39 AM
@udayAle @ep_gunner When NiFi is brought down, the current state (stopped, started, enabled, disabled) of all components is retained and on startup that same state is set on the components. Only time this is not true is when the property "nifi.flowcontroller.autoResumeState" is set to false in the nifi.properties file. When set to false a restart of NiFi would result in all components in a stopped state. In a production environment, this property should be set to true. Perhaps you can share more details on the maintenance process you are using as I am not clear on how your maintenance is impacting the last known state of some components. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more