Support Questions

Find answers, ask questions, and share your expertise

What does 'CPU Wait Access' mean on Ambari metrics

Explorer

Hey,

Anyone knows what does "CPU Wait Access" mean on Ambari YARN metrics and how it's calculated ?

I understand that it's related to CPU IO. But looks like the actual (max) CPU IO on workers is multiplied by ~20 and the resulting value is being shown as CPU Wait Access in my case.

Any better Idea ?

Thanks.

6 REPLIES 6

@Priyan S

CPU Wait Access is a standard way of measuring how much of CPU time is spent waiting for I/O operations to complete. If you have a high value for Wait Access, then that typically means you have an I/O bottleneck.

Which screen are you reporting these numbers from? Is this from a specific host page, or an overall metric? I'm not sure why there would be any multiplication factor applied. How many servers are in your cluster?

Explorer

Yes, I understand the concept. Just confused on how it's calculated.

>>Which screen are you reporting these numbers from? Is this from a specific host page, or an overall metric?

Actually there's a "CPU Wait Access" widget available on "YARN"page in Ambari.

I have 8 worker (data) nodes 6 master nodes.

What I can see is, if the maximum I/O wait on any of the worker nodes is 2% (shown in host specific metrics/grafana/nagios etc), then the CPU Wait Access reports a peak of 40%. Same applies for any number.

Expert Contributor

@Priyan S

Can you share a snapshot of the widget that you are talking about? Or, the metric that is being discussed here? You can get that by editing the widget and looking at the metrics used to populate a widget.

Explorer

@Aravindan Vijayan

Thanks for your reply. I am attaching the metrics details below.

11970-cpu-wait-access.png

cpu_wio._avg and cpu_wio._max are the metrics here. Also, I don't think the values shown there are the actual cpu io wait, rather it's multiplied/added with some x factor.

You can see this graph in Ambari Dashboard > Yarn. Under the metrics graphs, you will see one like in the above preview image.

Expert Contributor

The cpu_wio metric corresponds to the following metric being captured by psutil.

  • iowait (Linux): percentage of time spent waiting for I/O to complete .

For more reference, the cpu_wio is got from the following psutil API - https://pythonhosted.org/psutil/#psutil.cpu_times_percent.

In the YARN page, the cpu_wio._avg is the average metric value for all nodes in the YARN cluster (nodemanagers). The cpu_wio._max is the maximum of the all the cpu_wio values from the YARN cluster.

You can use the "System Servers" Grafana dashboard to delve deeper to check why higher values are seen in the graph. This metric is being captured in the "CPU - IOWAIT/INTR" section in that dashboard.

Explorer

>> In the YARN page, the cpu_wio._avg is the average metric value for all nodes in the YARN cluster (nodemanagers). The cpu_wio._max is the maximum of the all the cpu_wio values from the YARN cluster.

This was my understanding. But, see below -

>> You can use the "System Servers" Grafana dashboard to delve deeper to check why higher values are seen in the graph. This metric is being captured in the "CPU - IOWAIT/INTR" section in that dashboard.

The value shown here doesn't match with the one I see on Ambari dashboard. If the max value for io wait among all the nodemanagers is 5% on grafana, then the max value shown on Ambari dashboard is different, may be 10 or 20 or even 40. Another crazy thing I noted is, when we zoom in a portion of graph in Grafana, the % value changes. Not sure if it's the same for all.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.