Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Contributor

Summary of this article

This article attempts to explain the data sources of the Overview and Usage tabs and their meanings in the Admin page of CDSW.

Background

I have encountered several customers asking me the same questions. It is how to evaluate the resource usage of the CDSW cluster.

Below, I will record some of the specific questions asked by customers with a representative meaning, so that I can respond to customers more quickly in the future, and for other people's reference.

Why is the usage graphs I observed on Grafana of CDSW not consistent with the resource usage rate observed on the host chart s on CM?

For your confusion, from the conclusion:

  1. On the CM Web UI, find the page of the specific host. The charts you see under the Chart Library are all accurate. Please use this as a benchmark to judge the true situation of the host's resource usage. The data source for this part is the Host Monitor of Cloudera Management Service, a monitoring service designed and developed by Cloudera itself.
  2. On the CDSW Web UI, log in with the admin account and find the Overview tab in Site Administration page, such as the attachment: "cdsw-admin-total-resources.png". The parts of Total Memory, Total vCPUs and Used are also accurate.
    P.S.: This usage is not a physical usage rate. For example, you apply for a Pod, using a busybox image, and request resources of (1000m, 2GiB), but in fact you have been running the sleep command in this Pod, so of course it is at the physical level, which is at the level of CM’s Host charts, you can’t see much resource usage.
    image.png
    The data source of this part is obtained by the service of CDSW from Kubernetes API. You will find that the Total resources in Site Administration will be less than the total resources of all Master and Worker nodes. This is because at the Kubernetes level, some reserved resources are set for Node and reserved for Kubernetes itself (such as kube-apiserver, kube-controller, kube-scheduler, etcd, kubelet...).
    Log in to the CDSW Master node and use the kubectl command line tool to verify:
    # kubectl get nodes
    NAME                                      STATUS   ROLES    AGE   VERSION
    [hostname4]   Ready    master   19h   v1.13.9-1+6c8cb1a92335e2
    [hostname5]   Ready    <none>   19h   v1.13.9-1+6c8cb1a92335e2
    # kubectl describe node [hostname4]
    ...
    Capacity:
     cpu:                8
     ephemeral-storage:  262132716Ki
     hugepages-2Mi:      0
     memory:             32779704Ki
     pods:               110
    Allocatable:
     cpu:                6500m
     ephemeral-storage:  262132716Ki
     hugepages-2Mi:      0
     memory:             29121976Ki
     pods:               110
    ...
    # kubectl describe node [hostname5]
    ...
    Capacity:
     cpu:                8
     ephemeral-storage:  262132716Ki
     hugepages-2Mi:      0
     memory:             32779704Ki
     pods:               110
    Allocatable:
     cpu:                6500m
     ephemeral-storage:  262132716Ki
     hugepages-2Mi:      0
     memory:             29121976Ki
     pods:               110
    As can be seen from the output of the above commands, there are 2 nodes in my CDSW, and each node has allocatable resource of 6500m, 29121976Ki. In Kubernetes, 1 CPU core is 1000m, so here is exactly 13 cores, which corresponds to Total vCPUs in Site Administration. The same applies to memory.
    Then the used part is also obtained by the CDSW service from the Kubernetes API, we can also verify it through the kubectl command line tool.
    Also use the `kubectl describe node {nodeName}` command, and the following output will be displayed at the bottom:
    Non-terminated Pods:         (22 in total)
      Namespace                  Name                                           CPU Requests  CPU Limits  Memory Requests  Memory Limits   AGE
      ---------                  ----                                           ------------  ----------  ---------------  -------------   ---
      default-user-1             lf118vexnu6xxxxx                               1100m (16%)   0 (0%)      2084197Ki (7%)   1953125Ki (6%)  90m
    ...
    Allocated resources:
      (Total limits may be over 100 percent, i.e., overcommitted.)
      Resource           Requests         Limits
      --------           --------         ------
      cpu                2740m (42%)      300m (4%)
      memory             7976293Ki (27%)  13159781Ki (45%)
      ephemeral-storage  0 (0%)           0 (0%)
    The Requests here are the resources requested by the Pod in this Node. These resources are used to count the Used part of the Site Administration.
  3. Regarding the use of Grafana and the data source and specific calculation logic in the built-in Dashboard, we cannot give a very detailed answer, because Grafana is not a built-in monitoring service of CDSW. I can only tell you that the data source of Grafana is Prometheus This monitoring software also runs in Kubernetes.
    Regarding the specific meaning of the charts in Grafana, I still need to investigate to see if I can answer your question.
    For now, one suggestion I can give you is, for example, if you want to know how Grafana calculates the data of a certain chart, you can find the query expression in Grafana in the following way:
    Click the expand button at the top right of this chart and press Edit, you can see the specific query expression. You can refer to the following images:
    pod-memory-usage-edit.pngpod-memory-usage-edit.png
    pod-memory-usage-query.pngpod-memory-usage-query.png
    But to understand the specific data source and meaning of this expression, one needs to be familiar with Grafana to find the logic of the corresponding source. Prometheus and Grafana are very popular metrics monitoring solution in the Kubernetes community. Our CDSW products have attached Prometheus and Grafana to facilitate users who are familiar with the Kubernetes ecosystem and Prometheus to use them in an out of box style.

    How to use Prometheus and Grafana is a very broad topic, you can refer to the official documents:
    https://grafana.com/docs/grafana/latest/
    https://prometheus.io/

In conclusion

  1. Regarding CDSW host-level resource usage, refer to the Host chart on the CM Web UI is the most accurate.
  2. Regarding the resource usage rate of CDSW at the Kubernetes level, refer to the Overview and Activity tabs in Site Administration for a macro understanding. This data is also accurate, but this is only a request for resources at the Kubernetes level. It is not the physical usage rate generated by the real workload. Refer to Kubernetes resource requests and limits.
  3. The default dashboard in Grafana has its own meaning. The calculation method of the chart seen on Grafana may be completely different from the Overview and Usage of CDSW. They have other meanings.
 
992 Views
0 Kudos