Created 08-30-2018 11:45 AM
we manage many hadoop clusters based on redhat OS ( version 7.x )
based on our experience ( many problem of low memory , disks performance , network problem , etc )
we agree that we need to install some monitoring tool that have ability to save the monitoring at least one month history details
from the link below https://neverendingsecurity.wordpress.com/tag/atop/
we saw a lot of monitoring tool and we not sure what is the best tool for hadoop clusters ,
meanwhile we install the atop tool that its fine ( but take a lot of space under /var/log/atop )
**but we still thinking if this is good selecting**
Created 09-01-2018 01:27 AM
Nagios / OpsView / Sensu are popular options I've seen
StatsD / CollectD / MetricBeat are daemon metric collectors (MetricBeat is somewhat tied to an Elasticsearch cluster though) that run on each server
Prometheus is a popular option nowadays that would scrape metrics exposed by local service
I have played around a bit with netdata, though I'm not sure if it can be applied for Hadoop monitoring use cases.
DataDog is a vendor that offers lots of integrations such as Hadoop, YARN, Kafka, Zookeeper, etc.
... Realistically, you need some JMX + System monitoring tool, and a bunch exist
Created 08-30-2018 01:31 PM
check_mk is what most use.
It is easy to configure provides you with a Nice UI with history saved.
The check_mk agents consume very less CPU and RAM hence avoiding any kind of any negative impact on any other application running on the Host.
Created 08-30-2018 01:52 PM
actually we think on tool that should installed on each linux machines , like the atop , the check_mk control the OS from WIN machines , and what we want is tool that give the info from the OS itself and runs on each OS itself
Created 08-30-2018 03:04 PM
i dont normally like to suggest non ASF options here in HCC, but have you checked out Elastic Beats? I am using MetricBeat to get unix cluster monitoring on our ambari nodes as well as windows workstations metrics such as:
There is also a WinLog beat that allows us to tap into Windows Syslog and Performance Monitoring.
Created 09-01-2018 01:27 AM
Nagios / OpsView / Sensu are popular options I've seen
StatsD / CollectD / MetricBeat are daemon metric collectors (MetricBeat is somewhat tied to an Elasticsearch cluster though) that run on each server
Prometheus is a popular option nowadays that would scrape metrics exposed by local service
I have played around a bit with netdata, though I'm not sure if it can be applied for Hadoop monitoring use cases.
DataDog is a vendor that offers lots of integrations such as Hadoop, YARN, Kafka, Zookeeper, etc.
... Realistically, you need some JMX + System monitoring tool, and a bunch exist