Support Questions

Find answers, ask questions, and share your expertise

YARN and HDFS monitoring via Grafana

avatar
Expert Contributor

Hello,

We are on CDP PB 7.1.9 and goal is to monitor YARN applications and performance along with few HDFS metrics on Grafana via dashboards and ultimately trigger alerts. Metrics has to be collected via Prometheus agent and shared with Grafana.

At this point, I have downloaded a Prometheus agent and created a below yml configuration. I understand this will collect all the metrics, which is intentional to start with.

lowercaseOutputName: true
rules:
  # All Gauge type Hadoop Metrics
  - pattern: 'hadoop<name=.*><>(Count|Value)'
    name: hadoop_${1}_gauge
    type: GAUGE

  # All Counter type Hadoop Metrics
  - pattern: 'hadoop<name=.*><>(Count|Value)'
    name: hadoop_${1}_counter
    type: COUNTER

Prometheus agent jar and this config file is stored at /var/lib/prometheus_jmx_config/.

For now testing this only on resource manager instances to verify if metrics are getting collected.

Following few articles and Grafana documentation, I understood that I will need to run Prometheus as a Java agent (our case) or a stand alone HTTP server. To achieve this, we need to expose / enable JMX for the components we need to monitor (Resource Manager in this case) which has to be done by adding java agent command in hadoop-env.sh. Referring to the documentation, below is the command I think should work:

YARN_RESOURCEMANAGER_OPTS="-javaagent:/var/lib/prometheus_jmx_config/jmx_prometheus_javaagent-0.20.0.jar=9091:/var/lib/prometheus_jmx_config/hadoop_jmx_exporter.yml"

I tried adding this command in Gateway Client Environment Advanced Configuration Snippet (Safety Valve) for hadoop-env.sh of YARN configuration and it prompted to restart impacted services as this would change yarn-conf for multiple services. However, YARN service (RM to be specific) was never restarted. So I restarted it thinking it will execute prometheus agent as Java process and ultimately enable JMX on port 9091. However, the agent never started so JMX did not got enabled.

We also have few Java related properties specific to components like below:
Java Configuration Options for NodeManager
Java Configuration Options for ResourceManager

However, as I am not confident if those are correct, would be great if someone could advise on the same or if any configuration that I have missed.

Thanks
Snm1523

1 ACCEPTED SOLUTION

avatar
Expert Contributor

For the benefit of anyone looking for this, 

Java Configuration Options for NodeManager
Java Configuration Options for ResourceManager
Java Configuration Options for <component name>

are the configuration that needs to be updated with the -javaagent command. This allows to pick the Prometheus agent jar and enable JMX exporter to collect metrics.

View solution in original post

1 REPLY 1

avatar
Expert Contributor

For the benefit of anyone looking for this, 

Java Configuration Options for NodeManager
Java Configuration Options for ResourceManager
Java Configuration Options for <component name>

are the configuration that needs to be updated with the -javaagent command. This allows to pick the Prometheus agent jar and enable JMX exporter to collect metrics.