You could use telegraf as metrics collector and sink the data nodes (volume information) metrics (also all master services metrics if need be) to graphite. Then, you can use graphite as a data source in grafana to alert on volume failures. This is an end to end enterprise solution if you want to start monitoring your cluster. Telegraf provided out of the box solution to monitor host level services as well.
... View more