Support Questions

Find answers, ask questions, and share your expertise

Hadoop metrics2 to Graphite: only 2 are received

avatar
Expert Contributor

I have a graphite server, to which I want to send Hadoop metrics2.

On paper it's easy. Just add log4j.logger.org.apache.hadoop.metrics2=DEBUG to the log4j template and update hadoop-metrics2.properties template with:

*.sink.graphite.class=org.apache.hadoop.metrics2.sink.GraphiteSink 
*.sink.graphite.server_host=10.x.x.x
*.sink.graphite.server_port=2003

datanode.sink.graphite.metrics_prefix=datanode
namenode.sink.graphite.metrics_prefix=namenode
resourcemanager.sink.graphite.metrics_prefix=resourcemanager
nodemanager.sink.graphite.metrics_prefix=nodemanager
jobhistoryserver.sink.graphite.metrics_prefix=jobhistoryserver
journalnode.sink.graphite.metrics_prefix=journalnode
maptask.sink.graphite.metrics_prefix=maptask
reducetask.sink.graphite.metrics_prefix=reducetask
applicationhistoryserver.sink.graphite.metrics_prefix=applicationhistoryserver

It works very well with one service (eg. datanode). If I put more than one, I will only get 2 services in graphite, and I cannot confirm that all metrics for those services are present.

Not knowing what metrics to expect and wanting to experiment, I do not want to filter on actual metric to limit their number.

On collectd side I can see one metric dropped (invalid), but one metric only. It does not account for all the rest. Furthemore, setting CollectInternalStats to true shows me that no metrics is dropped.

On Hadoop side... Well, I could not find anything telling me if metrics ar actually sent or not, if it succeeds or fail... Not logging anywhere.

So my 2 questions are:

  • How can I debug metrics2?
  • Is there any known reasons why I am missing metrics?

Context: hdp2.6 on AWS.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

You should be able to find DEBUG level messages in the individual Hadoop service logs; messages starting with org.apache.hadoop.metrics2.*

One config missing is:

# default sampling period

*.period=10

View solution in original post

2 REPLIES 2

avatar
Super Collaborator

You should be able to find DEBUG level messages in the individual Hadoop service logs; messages starting with org.apache.hadoop.metrics2.*

One config missing is:

# default sampling period

*.period=10

avatar
Expert Contributor

Fair enough about the *.period. As I did get metrics there is probably a smart default, but nice to have.

I indeed found some messages in the service logs, and all looks good. To be honest, it all worked today.

I then happily applied the settings to prod, and lo and behold, I only have 2 metrics there.

Carrying on thinking, I understood is that in metrics2.properties I say that I want for instance node manager metrics, but I then actually need to restart the node manages to see those metrics. Indeed, the cluster I worked on yesterday has been rebooted (dev cluster, switched off at night).

Now all works as expected.

Thanks!