Member since
09-08-2020
39
Posts
2
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1387 | 10-05-2021 11:52 PM | |
4108 | 10-03-2021 11:47 PM |
10-10-2021
11:29 PM
1 Kudo
@GregoryG, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
... View more
10-07-2021
08:19 PM
2 Kudos
A question about CDSW was asked by a customer.
Question:
We connect to external databases in CDSW. We found that some of the client IPs of the database are not in our LAN CIDR, and these IPs do not belong to the Node IP of CDSW. Why is this happening?
Response:
CDSW uses Kubernetes as its infrastructure. Kubernetes maintains the IP CIDR of Pod and Service internally. When the Kubernetes workload communicates with the outside, it needs to pass the Service abstraction layer. The Service layer mainly implements packet forwarding between Node and Pod through NAT.
Service has different modes. There are different ways to deal with the source IP NAT in different modes.
Service may retain the source (Node) IP. This nature may be related to your question. For details, please refer to the official Kubernetes document: Using Source IP.
In addition, if you want to test what the source IP is when accessing an external database from the Pod in the CDSW Kubernetes cluster, you can refer to my test steps in the experimental environment.
Some practice:
Here I have an external host-hostname.cloudera.com, and PostgresSQL Server is running on this host as follows:
20211007_04:25:21 [root@hostname ~]# hostname
hostname.cloudera.com
20211007_04:25:33 [root@hostname ~]# ps aux | sed -rn '1p;/postgres/Ip' | grep -v sed | head -n 2
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 52 0.0 0.0 164160 10560 ? Ss 00:51 0:02 /usr/pgsql-10/bin/postmaster -D /var/lib/pgsql/10/data/
20211007_04:26:06 [root@hostname ~]# ip -4 -o addr
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
8826: eth0 inet 172.xx.xx.xxx/20 scope global eth0\ valid_lft forever preferred_lft forever
20211007_04:26:21 [root@hostname ~]#
Perform the following steps on the CDSW Master host:
Download the docker image of Ubuntu: docker pull ubuntu
Create a deployment with the Ubuntu image:
Generate the deployment manifest template: kubectl create deployment ubuntu --dry-run --image=ubuntu:latest -o yaml> /tmp/ubuntu-deployment.yaml
Modify the deployment manifest template and execute the sleep command after the Ubuntu container is started. Modify the spec part in the /tmp/ubuntu-deployment.yaml file to the following content: spec:
containers:
- image: ubuntu:latest
args:
- sleep
- "1000000"
name: ubuntu
Use kubectl to create this Ubuntu deployment: kubectl apply -f /tmp/ubuntu-deployment.yaml
Enter this Ubuntu deployment, install Postgresql client and network tools:
Enter Ubuntu Pod: kubectl -n default exec -it $(kubectl -n default get pod -l "app=ubuntu" -o jsonpath='{.items[0].metadata.name}') -- bash
Command-line tools required for installation: apt update;
apt install -y iproute2; # This provide the `ip addr` utility.
apt install -y postgresql-client;
Use psql in Ubuntu Pod to connect to Postgres server on hostname.cloudera.com: psql -h hostname.cloudera.com -U postgres
:light_bulb: Enter the password here
Check the IP of the Ubuntu Pod, the IP of the host node where the Ubuntu Pod is located, and check the IP of the client on the Postgres server. They are all executed on the CDSW Master host.
Check the IP of Ubuntu Pod: [root@host-10-xx-xx-xx ~]# kubectl -n default get pods -l app=ubuntu -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ubuntu-7c458b9dfd-hr5qg 1/1 Running 0 64m 100.xx.xx.xx host-10-xx-xx-xx <none> <none>
[root@host-10-xx-xx-xx ~]# kubectl -n default exec -it $(kubectl -n default get pod -l "app=ubuntu" -o jsonpath='{.items[0].metadata.name}') -- ip -4 -o addr ➡ 此處執行的"ip -4 -o addr"是在ubuntu Pod中執行的。
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
5169: eth0 inet 100.xx.x.xx/16 brd 100.xx.xx.xx scope global eth0\ valid_lft forever preferred_lft forever
[root@host-10-xx-xx-xx ~]# kubectl -n default exec -it $(kubectl -n default get pod -l "app=ubuntu" -o jsonpath='{.items[0].metadata.name}') -- netstat -anp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 100.xx.xx.xx:43340 172.xx.xx.xx:5xxx ESTABLISHED 1417/psql
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node PID/Program name Path :light_bulb: Note that the 'netstat' command above is executed inside the Ubuntu Pod. We can see that the 'Local Address' is the IP of the Pod; the 'Foreign Address' is the IP of hostname.
Host Node's IP [root@host-10-xx-xx-xx ~]# ip -4 -o addr ➡ Note that the hostname here is the same as the NODE in (5.1) above, which is the host node of the ubuntu Pod.
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
2: eth0 inet 10.xx.xx.xx/22 brd 10.xx.xx.xx scope global noprefixroute dynamic eth0\ valid_lft 71352sec preferred_lft 71352sec
5044: docker0 inet 172.xx.xx.xx/16 scope global docker0\ valid_lft forever preferred_lft forever
5047: weave inet 100.xx.xx.xx/16 brd 100.xx.xx.xx scope global weave\ valid_lft forever preferred_lft forever
View the client's IP on Postgres server:
20211007_04:52:46 [hostname ~]# ps aux | sed -rn '1p;/post/Ip' | grep -v sed | sed -rn '1p;/100\.xx\.xx\.xx|10\.xx\.xx\.xx/Ip'
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 65536 0.0 0.0 164988 4148 ? Ss 03:55 0:00 postgres: postgres postgres 10.xx.xx.xx(43340) idle
20211007_04:53:19 [root@c3669-node1 ~]# ip -4 -o addr
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
8826: eth0 inet 172.xx.xx.xx/20 scope global eth0\ valid_lft forever preferred_lft forever
Conclusion
By default, when a Pod in Kubernetes accesses services outside the cluster, its source IP is the IP of the Node.
In addition, there are some ways to detect the Pod IP CIDR and Service IP CIDR in your CDSW Kubernetes cluster so that you can better understand the information of your CDSW environment infrastructure. All commands are executed on the CDSW master host.
Get Pod IP CIDR kubectl cluster-info dump | grep -m 1 cluster-cidr
Service IP CIDR kubectl cluster-info dump | grep -m 1 service-cluster-ip-range
Hope the above information is helpful to you.
... View more
10-03-2021
11:54 PM
Hello @DA-Ka The data might be skewed in one of the disks because some of the heavily used topics/partitions are in that particular disk. You may want to profile the data residing (sort the kafka logs.dir) and reassign them to the other disk using the partition reassignment tool kafka-reassign-partition.sh or manually. du -sh /kafka-logs/*
... View more
09-16-2021
08:59 AM
@Jaydeep_Ta The permissions are in place and owner is cloudera-scm user. It have 644 permission. I tried to install the same on 5.16 and it is working without any issues. Only 5.10 is having issue.
... View more
09-16-2021
03:42 AM
The answer would be no. @dansteu I once came into a support case. A customer wanted to use NiFi to integrate with an external Ranger, which means that NiFi and Ranger are not under the same Ambari management. Ranger's SME told us that this is impossible. At least this is not within the scope of Cloudera's technical support. The logic should be the same for Atlas. However, if you know enough about Ranger and Ambari, it should still be possible technically in theory.
... View more
09-16-2021
03:05 AM
CEM
What is Cloudera Edge Management (CEM)? Refer to What is Cloudera Edge Management.
There is no compatibility matrix between CEM and HDF because these are independent products and MiNiFi agents from CEM will be perfectly fine to send data into HDF/NiFi, if they need to.
SMM
From Install or Upgrade Ambari, HDF, and HDP document, we can see that the rge cluster must be managed by Ambari 2.7.x. It can be an HDF 3.3.x, 3.4.0 or 3.4.1.1 cluster, or an HDP 3.1 or 3.1.1 cluster.
And in Set up DP Platform document, you can see that before installing Streams Messaging Manager, you must first install or upgrade to DataPlane Platform 1.2.x
My understanding is: SMM is based on Dataplane Platform, so if you installed the latest Dataplane Platform, there is no way you cannot install the latest SMM. So, I believe DPS (DataPlane Platform) 1.3.1 or 1.3.0 and SMM 2.0.0 are compatible.
Actually, if you look at Installing DataPlane document, it looks to me the DataPlane Platform is based on Ambari 2.6 or 2.7, so theoretically, any HDF/HDP that is supported by Ambari 2.6 and 2.7 is also compatible with DPS.
Also, look at this document: Streams Messaging Manager installation steps
If you are installing SMM as a new component on an existing Ambari 2.7.5 managed cluster with HDP 3.1.5 and/or HDF 3.5.x, then file a support case in the Cloudera portal to get the correct version of SMM.
So it looks to me, SMM is compatible with Ambari 2.7.5 and HDF 3.5.x.
Hope this would answer your question.
... View more
09-05-2021
11:36 PM
Summary of the article
I have encountered issues of the following scenario several times:
A customer removes a host from a cluster managed by Cloudera Manager and adds the host to another cluster managed by Cloudera Manager. At this time, problems often occur.
These problems are often caused by the old cluster files remaining on the host, and these old files cause the new Cloudera Manager to be unable to control the host normally.
So I verified what files will be generated on a host after adding it to a CDP cluster managed by Cloudera Manager. In other words, after removing a host from Cloudera Manager, what files do we need to delete manually?
Introduction to the Test environment
CDP Runtime version: CDP PvC Base 7.1.6
CM version: Cloudera Manager 7.3.1
Whether to enable Kerberos: Yes
Whether to enable TLS: Yes
Auto-TLS: Yes
Auto-TLS Use Case: Use Case 1 - Using Cloudera Manager to generate an internal CA and corresponding certificates (Refer to Configuring TLS Encryption for Cloudera Manager Using Auto-TLS)
Experimental steps
After adding a host named c3669-temp-node1.kyanlab.cloudera.com to the cluster c3669, I added the YARN Node Manager role to this host, and then I used the following command to find the newly added files in the host:
find /usr -type d -iname '*cloudera*'
find /var -type d -iname '*cloudera*'
find /etc -type d -iname '*cloudera*'
find /opt -type d -iname '*cloudera*'
I found that because I chose to install the OpenJDK provided by Cloudera Manager when I added the node, there is this OpenJDK on this newly added node: /usr/java/jdk1.8.0_232-cloudera.
I found some YUM-related directories under the /var directory, and there are Cloudera Manager Server and Agent-related directories under /var/lib.
The Cloudera Manager Agent-related directories and YARN-related directories are created under the /etc directory (because I added the Node Manager role).
Needless to say, the parcel-related directories and some Cloudera Manager-related directories are naturally created in the /opt directory.
In addition, I found that many alternatives files have been created through the command ls -AFlh /etc/alternatives | grep -Ei cloudera
Regarding these alternatives files, we can track them as the following steps:
[root@c3669-temp-node1 hadoop-yarn]# which yarn
/usr/bin/yarn
[root@c3669-temp-node1 hadoop-yarn]# ls -AFlh /usr/bin/yarn
lrwxrwxrwx. 1 root root 22 Aug 30 10:18 /usr/bin/yarn -> /etc/alternatives/yarn*
[root@c3669-temp-node1 hadoop-yarn]# ls -AFlh /etc/alternatives/yarn
lrwxrwxrwx. 1 root root 63 Aug 30 10:18 /etc/alternatives/yarn -> /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/yarn*
[root@c3669-temp-node1 hadoop-yarn]# alternatives --list | grep -Ei yarn
yarn auto /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/yarn
hadoop-conf auto /etc/hadoop/conf.cloudera.yarn
[root@c3669-temp-node1 hadoop-yarn]#
What will happen after removing this new node from Cloudera Manager?
At this point, I can be sure that my Node Manager can run successfully on the new host c3669-temp-node1.kyanlab.cloudera.com.
I refer to this document to remove this node from Cloudera Manager.
Obviously, after deleting the host from Cloudera Manager according to the above document, any files on Node will not actually be deleted. Those files created in /usr/, /var/, /opt/, etc. still remain on this host.
Next, what files do we need to delete manually?
First of all, we definitely need to delete the software installed by the Cloudera Manager repo.
Of course, if you plan to add this node to another CDP of the same version, you can omit this step.
# yum repolist | grep -Ei cloudera
cloudera-manager Cloudera Manager, Version 7.3.1 6 # yum repo-pkgs cloudera-manager list
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirrors.radwebhosting.com
* epel: mirror.prgmr.com
* extras: mirror.sfo12.us.leaseweb.net
* updates: sjc.edge.kernel.org
Installed Packages
cloudera-manager-agent.x86_64 7.3.1-10891891.el7 -manager
cloudera-manager-daemons.x86_64 7.3.1-10891891.el7 -manager
openjdk8.x86_64 8.0+232_9-cloudera -manager # yum repo-pkgs cloudera-manager remove -y
From the output of the above command, we can find that on my new node, three software have been installed through the Cloudera Manager repo. They are CM Agent, CM daemons and OpenJDK. Use the repo-pkgs remove command to delete these software. Since I plan to add this node to a CDP 7.1.6 cluster of the same version later (by another CM), I will skip this step here.
Update:
I found that these yum packages installed by the cloudera-manager repo still need to be manually deleted. Because when I add the old node to a new cluster, an error occurs during the installation of Cloudera packages. The reason for the error is that when I manually deleted some directories under /usr, /var and /etc, some of these directories were managed by Cloudera packages, so I need to reinstall these packages, and if I don’t delete these packages manually, CM will think that the package on this host does not need to be installed, so the subsequent configuration will fail due to lack of files.
Therefore, whether you plan to add the old host to another cluster of the same version or a different version of the cluster, you need to manually delete the package installed by the cloudera-manager repo first. And the step of deleting package needs to be performed before finding {/usr, /var, /etc...} and deleting related directories.
And, in my environment, "yum repo-pkgs cloudera-manager remove" does not work, so I use a workaround to delete these packages:
clouderaPkgs=`(yum list installed | grep -Ei cloudera | awk '{print $1}')`
for i in ${clouderaPkgs[@]}; do
yum remove -y $i;
done
Now, let us remove the remaining directories created by Cloudera packages:
declare -a dirsSCM
dirsSCM=(`find /var -type d -iname '*cloudera*'`)
for i in ${dirsSCM[@]}; do
echo $i
rm -rf $i
done
dirsSCM=(`find /etc -type d -iname '*cloudera*'`)
for i in ${dirsSCM[@]}; do
echo $i
rm -rf $i
done
dirsSCM=(`find /opt -type d -iname '*cloudera*'`)
for i in ${dirsSCM[@]}; do
echo $i
rm -rf $i
done
Of course, don't forget, it is best to delete the directory /var/run/cloudera-scm-agent, which is used by the CM Agent to manage various role instances (such as DataNode, NodeManger, etc.).
Update: I just found that after a reboot, this directory is gone. It's a tmpfs filesystem.
Then we need to manually clean up the alternatives-related files. This is a bit troublesome, and I have encountered several cases where the customer added the host to a new Cloudera Manager managed cluster and since the old alternatives related files already exist, the new cluster's files will not be propagated correctly.
# ls -AFlh /etc/alternatives | grep -Ei cloudera | awk '{print $9"\t"$NF}' | sed -r 's/\*$//g' > /tmp/alternaives_cloudera_list.txt
# head -n 5 /tmp/alternaives_cloudera_list.txt
avro-tools /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/avro-tools
beeline /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/beeline
bigtop-detect-javahome /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/bigtop-detect-javahome
catalogd /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/catalogd
cdsw /opt/cloudera/parcels/CDSW-1.9.1.p1.10118148/scripts/cdsw
# wc -l /tmp/alternaives_cloudera_list.txt
131 /tmp/alternaives_cloudera_list.txt # filter out the alternatives items generated by Cloudera Manager.
ls -AFlh /etc/alternatives | grep -Ei cloudera | awk '{print $9"\t"$NF}' | sed -r 's/\*$//g' > /tmp/alternaives_cloudera_list.txt
# For example, "alternatives --remove yarn /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/yarn" will delete this item for yarn generated by Cloudera Manager.
# use a loop to delete all the items.
while read line; do
argsArr=($line);
echo -e "${argsArr[0]}...${argsArr[1]}";
alternatives --remove ${argsArr[0]} ${argsArr[1]};
done < /tmp/alternaives_cloudera_list.txt
So far, I should have cleaned up all the files that need to be manually deleted, so I restarted the host.
Then I created a new CDP 7.1.6 cluster, turned on Kerberos and TLS, and tried to add the host just now.
I deployed a CDP PvC Base 7.1.6 cluster with a one-click deployment script, the CM version is 7.3.1, and the new cluster is named C1669.
Therefore, the CDP version and CM version of this new cluster -- c1669 is the same as the version of the old cluster -- c3669.
After deploying the cluster -- c1669, I used the script to enable Kerberos and TLS. Now the KDC server used by the cluster is located on the host c1669-node1, which is also the host where the CM is located.
Regarding TLS I also used the same Auto-TLS case 1 as the c3669 cluster.
Now I'm adding the host c3669-temp-node1.kyanlab.cloudera.com to the newly created c1669 cluster and try to add a NodeManager role to it to see if it can be successfully started.
As a result, as I expected, the host c3669-temp-node1.kyanlab.cloudera.com was successfully added to the cluster c1669, and the newly deployed Node Manager can be successfully started.
Conclusion
If you need to remove a host from an existing CDP/CDH cluster and add it to another CDP/CDH cluster, please follow the steps below:
Refer to this document to remove this node from Cloudera Manager.
Remove the packages which are installed via cloudera-manager repository.
Delete the remained files created by CM Agent and those packages installed by cloudera-manager.
Reboot the host.
Now you can add this host to a new cluster managed by CM.
... View more
Labels:
08-23-2021
12:51 AM
Hi @blueb 感谢您提供的宝贵的反馈! 基本上 Flink 连接 Kafka 也是遵照常规 Java 项目使用 Kafka 的模式,您可以参考此链接了解常规 Java client 与 Kafka 连接时的主要选项。 您的非 CM 管理的 Kafka 集群若是未启用认证的话,应该属于"Unsecured"。 我在 Cloudera 的官方 Github 上找到了一个 Flink ↔ Kafka 的 demo 项目,您可以参考其中的job.properties。 另外这个项目还有连接 secure Kafka 的 demo 项目,其中有配置连接 Kafka 的部分。 您可以看到 job.properties 文件中定义了: kafka.security.protocol=SASL_SSL 这个 SASL_SSL 的含义是: 使用 SASL/PLAIN (CDP 中的 Kafka 开启 Kerberos 认证参考此链接) 作为认证方式,并使用 SSL/TLS 作为数据传输方式(也就是除了配置了认证之外,还在CM UI中Enable TLS/SSL for Kafka Broker)。参考: Confluent 官方文档。 如果传输方式没有Enable TLS/SSL,那么 Kafka Broker 的日志 (/var/log/kafka/server.log) 中,您会看到 listeners = SASL_PLAINTEXT;如果开启了Kerberos 认证 (或LDAP、PAM等其他SASL认证) 又Enable TLS/SSL for Kafka Broker,那么您会看到 listeners = SASL_SSL。 另外,值得注意的是,您可以同时配置多个listener,也就是listeners = SASL_PLAINTEXT 和 listeners = SASL_SSL 可以同时存在。 另外此 demo 代码也有一个 YouTube 视频演示。 以上信息供您参考。
... View more
08-19-2021
05:26 AM
Add the origin of SERVICE_ACCOUNT_SECRET: if [ -z "$SERVICE_ACCOUNT_SECRET" ]; then
# Attempt to get from kubectl. Generally, this only works from the
# kubernetes master node.
SERVICE_ACCOUNT_SECRET=$(kubectl get secret internal-secrets --namespace=${CDSW_NAMESPACE} -o jsonpath="{.data['service\.account\.secret']}" | base64 -d)
fi
if [ -z "$SERVICE_ACCOUNT_SECRET" ]; then
die_with_error 2 "Unable to get service account credentials. Provide SERVICE_ACCOUNT_SECRET or run on master node."
fi
... View more
07-01-2021
08:17 AM
Summary of this article
This article attempts to explain the data sources of the Overview and Usage tabs and their meanings in the Admin page of CDSW.
Background
I have encountered several customers asking me the same questions. It is how to evaluate the resource usage of the CDSW cluster.
Below, I will record some of the specific questions asked by customers with a representative meaning, so that I can respond to customers more quickly in the future, and for other people's reference.
Why is the usage graphs I observed on Grafana of CDSW not consistent with the resource usage rate observed on the host chart s on CM?
For your confusion, from the conclusion:
On the CM Web UI, find the page of the specific host. The charts you see under the Chart Library are all accurate. Please use this as a benchmark to judge the true situation of the host's resource usage. The data source for this part is the Host Monitor of Cloudera Management Service, a monitoring service designed and developed by Cloudera itself.
On the CDSW Web UI, log in with the admin account and find the Overview tab in Site Administration page, such as the attachment: "cdsw-admin-total-resources.png". The parts of Total Memory, Total vCPUs and Used are also accurate. P.S.: This usage is not a physical usage rate. For example, you apply for a Pod, using a busybox image, and request resources of (1000m, 2GiB), but in fact you have been running the sleep command in this Pod, so of course it is at the physical level, which is at the level of CM’s Host charts, you can’t see much resource usage. The data source of this part is obtained by the service of CDSW from Kubernetes API. You will find that the Total resources in Site Administration will be less than the total resources of all Master and Worker nodes. This is because at the Kubernetes level, some reserved resources are set for Node and reserved for Kubernetes itself (such as kube-apiserver, kube-controller, kube-scheduler, etcd, kubelet...). Log in to the CDSW Master node and use the kubectl command line tool to verify: # kubectl get nodes
NAME STATUS ROLES AGE VERSION
[hostname4] Ready master 19h v1.13.9-1+6c8cb1a92335e2
[hostname5] Ready <none> 19h v1.13.9-1+6c8cb1a92335e2
# kubectl describe node [hostname4]
...
Capacity:
cpu: 8
ephemeral-storage: 262132716Ki
hugepages-2Mi: 0
memory: 32779704Ki
pods: 110
Allocatable:
cpu: 6500m
ephemeral-storage: 262132716Ki
hugepages-2Mi: 0
memory: 29121976Ki
pods: 110
...
# kubectl describe node [hostname5]
...
Capacity:
cpu: 8
ephemeral-storage: 262132716Ki
hugepages-2Mi: 0
memory: 32779704Ki
pods: 110
Allocatable:
cpu: 6500m
ephemeral-storage: 262132716Ki
hugepages-2Mi: 0
memory: 29121976Ki
pods: 110 As can be seen from the output of the above commands, there are 2 nodes in my CDSW, and each node has allocatable resource of 6500m, 29121976Ki. In Kubernetes, 1 CPU core is 1000m, so here is exactly 13 cores, which corresponds to Total vCPUs in Site Administration. The same applies to memory. Then the used part is also obtained by the CDSW service from the Kubernetes API, we can also verify it through the kubectl command line tool. Also use the `kubectl describe node {nodeName}` command, and the following output will be displayed at the bottom:
Non-terminated Pods: (22 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
default-user-1 lf118vexnu6xxxxx 1100m (16%) 0 (0%) 2084197Ki (7%) 1953125Ki (6%) 90m
...
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 2740m (42%) 300m (4%)
memory 7976293Ki (27%) 13159781Ki (45%)
ephemeral-storage 0 (0%) 0 (0%)
The Requests here are the resources requested by the Pod in this Node. These resources are used to count the Used part of the Site Administration.
Regarding the use of Grafana and the data source and specific calculation logic in the built-in Dashboard, we cannot give a very detailed answer, because Grafana is not a built-in monitoring service of CDSW. I can only tell you that the data source of Grafana is Prometheus This monitoring software also runs in Kubernetes. Regarding the specific meaning of the charts in Grafana, I still need to investigate to see if I can answer your question. For now, one suggestion I can give you is, for example, if you want to know how Grafana calculates the data of a certain chart, you can find the query expression in Grafana in the following way: Click the expand button at the top right of this chart and press Edit, you can see the specific query expression. You can refer to the following images: pod-memory-usage-edit.png pod-memory-usage-query.pngBut to understand the specific data source and meaning of this expression, one needs to be familiar with Grafana to find the logic of the corresponding source. Prometheus and Grafana are very popular metrics monitoring solution in the Kubernetes community. Our CDSW products have attached Prometheus and Grafana to facilitate users who are familiar with the Kubernetes ecosystem and Prometheus to use them in an out of box style. How to use Prometheus and Grafana is a very broad topic, you can refer to the official documents: https://grafana.com/docs/grafana/latest/ https://prometheus.io/
In conclusion
Regarding CDSW host-level resource usage, refer to the Host chart on the CM Web UI is the most accurate.
Regarding the resource usage rate of CDSW at the Kubernetes level, refer to the Overview and Activity tabs in Site Administration for a macro understanding. This data is also accurate, but this is only a request for resources at the Kubernetes level. It is not the physical usage rate generated by the real workload. Refer to Kubernetes resource requests and limits.
The default dashboard in Grafana has its own meaning. The calculation method of the chart seen on Grafana may be completely different from the Overview and Usage of CDSW. They have other meanings.
... View more