Support Questions
Find answers, ask questions, and share your expertise

Best practice for email monitoring for NiFi?

Contributor

Currently I have a flow in place that will monitor disk usage by using the MonitorDiskUsageReportingTask in conjunction with the SiteToSiteBulletinReportingTask . At the end of the flow, I have the flowfile, which is essentially each bulletin, being put to email along with a message describing the alert. My issue is, since I have this reporting task set to run every minute to ensure we get the most up to date alerts regarding disk usage, I will receive an email every minute once the disk usage goes over the threshold I set.

Is there a way I can configure it so only one email is sent once this goes over the threshold? And then maybe have an "OK" status email go out after the disk usage has decreased beneath the threshold?

7 REPLIES 7

Master Guru
@Josh Nicholson

-

Take a look at the monitorActivity processor. While its intended use is to monitor for a dataflow outage, it could be used to accomplish what you are looking for as well. You would just need to ignore first lackofactivity message and set "continually send messages to false. Each time an "activity restored message" is generated that would indicate you are seeing a FlowFile routed there that has a disk usage threshold over what you expect. later when disk usage drops back below threshold and flowfiles stop routing to this processor, a "Inactivity Message" will be produced that could be sent to email processor to notify you disk usage is back below threshold.

-

Thanks,

Matt

Contributor

@Matt Clarke

Thanks for the suggestion, this should give me what I want. One thing that I thought of as I was implementing this however, is currently I have six different disks I am monitoring with six different DiskUsage reporting tasks, as I have different NiFi repositories split over these disks. To keep things clean, these are all being picked up by one SiteToSiteBulletinReportingTask and being submitted to the same process group for alerting.

If say disk1 went over the threshold, then disk2 went over while disk1 was still over the threshold, I would never receive an email alert about disk2. The only way around this I can think of is having a separate MonitorActivity for each disk, but then this requires hardcoding which I would rather avoid as this makes it difficult to deploy this monitoring between environments with different disk configurations. Can you think of any way around this?

Master Guru

@Josh Nicholson

That is correct, so you would need to route each disk specific set of bulletins to its own monitorActivity, but messages generated could go to same putEmail processor.

Contributor

@Matt Clarke

Gotcha. One more question, since it appears the monitorActivity just sends a status, and not the flowfile, is there anyway I can get the attributes of the last flowfile? I'm wanting to include the hostname and original bulletin message in the OK status email, but using the "inactive" relationship seems to make this not possible as the attributes from the original flowfile aren't carried over.

Master Guru

@Josh Nicholson

You are correct. In terms of how "inactivity" works, there is no FlowFile that triggers this, thus no flowfile to pull content/attributes from.

-

You could sort bulletins based on hosts so that those from each host go to their own MonitorActivity. The "activity restored" message indicates an issue exists and in that case you have a FlowFile where you extract info and put in that notification. Then the "inactivity message" indicated things have returned to normal. There you would only be able to build a customer message based on time for example:

-

Current system time: ${now():format('yyyy/MM/dd HH:mm:ss')}; 
No disk utilization issues reported for <host> in past ${inactivityDurationMillis:toNumber():divide(60000)} minutes. Will receive new email if disk utilization crest <x>% threshold again.

-

as long as you have a MonitorActivity for each NiFi host/node, you can hardcode a unique hostname in each one.

-

Thanks,

Matt

Hi @Josh Nicholson

For this you can use a control rate processor and set a flow file expire duration for the its input relation. Control rate will let only one message go let say per 10 minutes. The other flow files will stay in the queue and get expired and deleted after 10 minutes.

Thanks

Abdelkrim

Contributor

Thanks for the suggestion @Abdelkrim Hadjidj. I'm going to use this to send the initial alert and then use @Matt Clarke's suggestion to send an OK alert when the issue has been resolved.

; ;