Member since
09-08-2020
39
Posts
2
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1392 | 10-05-2021 11:52 PM | |
4116 | 10-03-2021 11:47 PM |
10-07-2021
08:19 PM
2 Kudos
A question about CDSW was asked by a customer.
Question:
We connect to external databases in CDSW. We found that some of the client IPs of the database are not in our LAN CIDR, and these IPs do not belong to the Node IP of CDSW. Why is this happening?
Response:
CDSW uses Kubernetes as its infrastructure. Kubernetes maintains the IP CIDR of Pod and Service internally. When the Kubernetes workload communicates with the outside, it needs to pass the Service abstraction layer. The Service layer mainly implements packet forwarding between Node and Pod through NAT.
Service has different modes. There are different ways to deal with the source IP NAT in different modes.
Service may retain the source (Node) IP. This nature may be related to your question. For details, please refer to the official Kubernetes document: Using Source IP.
In addition, if you want to test what the source IP is when accessing an external database from the Pod in the CDSW Kubernetes cluster, you can refer to my test steps in the experimental environment.
Some practice:
Here I have an external host-hostname.cloudera.com, and PostgresSQL Server is running on this host as follows:
20211007_04:25:21 [root@hostname ~]# hostname
hostname.cloudera.com
20211007_04:25:33 [root@hostname ~]# ps aux | sed -rn '1p;/postgres/Ip' | grep -v sed | head -n 2
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 52 0.0 0.0 164160 10560 ? Ss 00:51 0:02 /usr/pgsql-10/bin/postmaster -D /var/lib/pgsql/10/data/
20211007_04:26:06 [root@hostname ~]# ip -4 -o addr
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
8826: eth0 inet 172.xx.xx.xxx/20 scope global eth0\ valid_lft forever preferred_lft forever
20211007_04:26:21 [root@hostname ~]#
Perform the following steps on the CDSW Master host:
Download the docker image of Ubuntu: docker pull ubuntu
Create a deployment with the Ubuntu image:
Generate the deployment manifest template: kubectl create deployment ubuntu --dry-run --image=ubuntu:latest -o yaml> /tmp/ubuntu-deployment.yaml
Modify the deployment manifest template and execute the sleep command after the Ubuntu container is started. Modify the spec part in the /tmp/ubuntu-deployment.yaml file to the following content: spec:
containers:
- image: ubuntu:latest
args:
- sleep
- "1000000"
name: ubuntu
Use kubectl to create this Ubuntu deployment: kubectl apply -f /tmp/ubuntu-deployment.yaml
Enter this Ubuntu deployment, install Postgresql client and network tools:
Enter Ubuntu Pod: kubectl -n default exec -it $(kubectl -n default get pod -l "app=ubuntu" -o jsonpath='{.items[0].metadata.name}') -- bash
Command-line tools required for installation: apt update;
apt install -y iproute2; # This provide the `ip addr` utility.
apt install -y postgresql-client;
Use psql in Ubuntu Pod to connect to Postgres server on hostname.cloudera.com: psql -h hostname.cloudera.com -U postgres
:light_bulb: Enter the password here
Check the IP of the Ubuntu Pod, the IP of the host node where the Ubuntu Pod is located, and check the IP of the client on the Postgres server. They are all executed on the CDSW Master host.
Check the IP of Ubuntu Pod: [root@host-10-xx-xx-xx ~]# kubectl -n default get pods -l app=ubuntu -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ubuntu-7c458b9dfd-hr5qg 1/1 Running 0 64m 100.xx.xx.xx host-10-xx-xx-xx <none> <none>
[root@host-10-xx-xx-xx ~]# kubectl -n default exec -it $(kubectl -n default get pod -l "app=ubuntu" -o jsonpath='{.items[0].metadata.name}') -- ip -4 -o addr ➡ 此處執行的"ip -4 -o addr"是在ubuntu Pod中執行的。
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
5169: eth0 inet 100.xx.x.xx/16 brd 100.xx.xx.xx scope global eth0\ valid_lft forever preferred_lft forever
[root@host-10-xx-xx-xx ~]# kubectl -n default exec -it $(kubectl -n default get pod -l "app=ubuntu" -o jsonpath='{.items[0].metadata.name}') -- netstat -anp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 100.xx.xx.xx:43340 172.xx.xx.xx:5xxx ESTABLISHED 1417/psql
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node PID/Program name Path :light_bulb: Note that the 'netstat' command above is executed inside the Ubuntu Pod. We can see that the 'Local Address' is the IP of the Pod; the 'Foreign Address' is the IP of hostname.
Host Node's IP [root@host-10-xx-xx-xx ~]# ip -4 -o addr ➡ Note that the hostname here is the same as the NODE in (5.1) above, which is the host node of the ubuntu Pod.
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
2: eth0 inet 10.xx.xx.xx/22 brd 10.xx.xx.xx scope global noprefixroute dynamic eth0\ valid_lft 71352sec preferred_lft 71352sec
5044: docker0 inet 172.xx.xx.xx/16 scope global docker0\ valid_lft forever preferred_lft forever
5047: weave inet 100.xx.xx.xx/16 brd 100.xx.xx.xx scope global weave\ valid_lft forever preferred_lft forever
View the client's IP on Postgres server:
20211007_04:52:46 [hostname ~]# ps aux | sed -rn '1p;/post/Ip' | grep -v sed | sed -rn '1p;/100\.xx\.xx\.xx|10\.xx\.xx\.xx/Ip'
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 65536 0.0 0.0 164988 4148 ? Ss 03:55 0:00 postgres: postgres postgres 10.xx.xx.xx(43340) idle
20211007_04:53:19 [root@c3669-node1 ~]# ip -4 -o addr
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
8826: eth0 inet 172.xx.xx.xx/20 scope global eth0\ valid_lft forever preferred_lft forever
Conclusion
By default, when a Pod in Kubernetes accesses services outside the cluster, its source IP is the IP of the Node.
In addition, there are some ways to detect the Pod IP CIDR and Service IP CIDR in your CDSW Kubernetes cluster so that you can better understand the information of your CDSW environment infrastructure. All commands are executed on the CDSW master host.
Get Pod IP CIDR kubectl cluster-info dump | grep -m 1 cluster-cidr
Service IP CIDR kubectl cluster-info dump | grep -m 1 service-cluster-ip-range
Hope the above information is helpful to you.
... View more
10-05-2021
11:52 PM
Hi @GregoryG User Number of sessions Total CPU The above columns can be confirmed by the site administration -> usage report. Refer to the following screenshot: But the last one is impossible to be gathered directly by CDSW. And I think it's hard to do that. One possible way to achieve that is to deploy a service to capture the metrics from Spark History Server API and CDSW's API. Specifically, in general, users generate two kinds of workloads through CDSW, one is Spark workload, and the other is Engine running in CDSW's local Kubernetes cluster. Spark's workload runs on a CDP cluster, and the amount of data processed by these workloads will be recorded in the event log and can be viewed through the Spark History Web UI. The Spark History Web UI also has a Rest API, so you can look for tools on the Internet that can obtain data from the Spark History Server API or write a script to count the data. The local Engine of CDSW is usually used to execute some Machine Learning and Deep Learning workloads. I guess that the amount of data processed by these workloads should not be recorded by CDSW, but by specific frameworks of ML and DL. These frameworks should have roles like Spark History Server, so you should investigate from that aspect.
... View more
10-03-2021
11:47 PM
1 Kudo
Hi @DA-Ka Please check this official documentation: Reassigning replicas between log directories. kafka-reassign-partitions Reassigning replicas between log directories can prove useful when you have multiple disks available, but one or more of them is nearing capacity.
... View more
09-16-2021
04:02 AM
No, for me it doesn't sound like the deletion is not completely, but the CSD is preventing CM to recognize Spark 2.
... View more
09-16-2021
03:42 AM
The answer would be no. @dansteu I once came into a support case. A customer wanted to use NiFi to integrate with an external Ranger, which means that NiFi and Ranger are not under the same Ambari management. Ranger's SME told us that this is impossible. At least this is not within the scope of Cloudera's technical support. The logic should be the same for Atlas. However, if you know enough about Ranger and Ambari, it should still be possible technically in theory.
... View more
09-16-2021
03:32 AM
I don't understand what you are trying to say. If you have deployed the Spark 2 Gateway Role once, then the CSD file is correct. If you have removed Spark2 via CM UI already, of course you can't find it.
... View more
09-16-2021
03:23 AM
Well, then I believe it's not the correct CSD file...
... View more
09-16-2021
03:05 AM
CEM
What is Cloudera Edge Management (CEM)? Refer to What is Cloudera Edge Management.
There is no compatibility matrix between CEM and HDF because these are independent products and MiNiFi agents from CEM will be perfectly fine to send data into HDF/NiFi, if they need to.
SMM
From Install or Upgrade Ambari, HDF, and HDP document, we can see that the rge cluster must be managed by Ambari 2.7.x. It can be an HDF 3.3.x, 3.4.0 or 3.4.1.1 cluster, or an HDP 3.1 or 3.1.1 cluster.
And in Set up DP Platform document, you can see that before installing Streams Messaging Manager, you must first install or upgrade to DataPlane Platform 1.2.x
My understanding is: SMM is based on Dataplane Platform, so if you installed the latest Dataplane Platform, there is no way you cannot install the latest SMM. So, I believe DPS (DataPlane Platform) 1.3.1 or 1.3.0 and SMM 2.0.0 are compatible.
Actually, if you look at Installing DataPlane document, it looks to me the DataPlane Platform is based on Ambari 2.6 or 2.7, so theoretically, any HDF/HDP that is supported by Ambari 2.6 and 2.7 is also compatible with DPS.
Also, look at this document: Streams Messaging Manager installation steps
If you are installing SMM as a new component on an existing Ambari 2.7.5 managed cluster with HDP 3.1.5 and/or HDF 3.5.x, then file a support case in the Cloudera portal to get the correct version of SMM.
So it looks to me, SMM is compatible with Ambari 2.7.5 and HDF 3.5.x.
Hope this would answer your question.
... View more
09-16-2021
02:58 AM
I would suggest you to check Cloudera Manager's log. If the CSD is correct, after restarting CM, there will be logs showing CM has recognized the CSD. Then add the Spark2 parcel via CM UI. Then you should be able to download the Spark2 parcel, distribute and activate it. Then you should see Spark2 listed when you try to add new service via CM UI.
... View more
09-05-2021
11:36 PM
Summary of the article
I have encountered issues of the following scenario several times:
A customer removes a host from a cluster managed by Cloudera Manager and adds the host to another cluster managed by Cloudera Manager. At this time, problems often occur.
These problems are often caused by the old cluster files remaining on the host, and these old files cause the new Cloudera Manager to be unable to control the host normally.
So I verified what files will be generated on a host after adding it to a CDP cluster managed by Cloudera Manager. In other words, after removing a host from Cloudera Manager, what files do we need to delete manually?
Introduction to the Test environment
CDP Runtime version: CDP PvC Base 7.1.6
CM version: Cloudera Manager 7.3.1
Whether to enable Kerberos: Yes
Whether to enable TLS: Yes
Auto-TLS: Yes
Auto-TLS Use Case: Use Case 1 - Using Cloudera Manager to generate an internal CA and corresponding certificates (Refer to Configuring TLS Encryption for Cloudera Manager Using Auto-TLS)
Experimental steps
After adding a host named c3669-temp-node1.kyanlab.cloudera.com to the cluster c3669, I added the YARN Node Manager role to this host, and then I used the following command to find the newly added files in the host:
find /usr -type d -iname '*cloudera*'
find /var -type d -iname '*cloudera*'
find /etc -type d -iname '*cloudera*'
find /opt -type d -iname '*cloudera*'
I found that because I chose to install the OpenJDK provided by Cloudera Manager when I added the node, there is this OpenJDK on this newly added node: /usr/java/jdk1.8.0_232-cloudera.
I found some YUM-related directories under the /var directory, and there are Cloudera Manager Server and Agent-related directories under /var/lib.
The Cloudera Manager Agent-related directories and YARN-related directories are created under the /etc directory (because I added the Node Manager role).
Needless to say, the parcel-related directories and some Cloudera Manager-related directories are naturally created in the /opt directory.
In addition, I found that many alternatives files have been created through the command ls -AFlh /etc/alternatives | grep -Ei cloudera
Regarding these alternatives files, we can track them as the following steps:
[root@c3669-temp-node1 hadoop-yarn]# which yarn
/usr/bin/yarn
[root@c3669-temp-node1 hadoop-yarn]# ls -AFlh /usr/bin/yarn
lrwxrwxrwx. 1 root root 22 Aug 30 10:18 /usr/bin/yarn -> /etc/alternatives/yarn*
[root@c3669-temp-node1 hadoop-yarn]# ls -AFlh /etc/alternatives/yarn
lrwxrwxrwx. 1 root root 63 Aug 30 10:18 /etc/alternatives/yarn -> /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/yarn*
[root@c3669-temp-node1 hadoop-yarn]# alternatives --list | grep -Ei yarn
yarn auto /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/yarn
hadoop-conf auto /etc/hadoop/conf.cloudera.yarn
[root@c3669-temp-node1 hadoop-yarn]#
What will happen after removing this new node from Cloudera Manager?
At this point, I can be sure that my Node Manager can run successfully on the new host c3669-temp-node1.kyanlab.cloudera.com.
I refer to this document to remove this node from Cloudera Manager.
Obviously, after deleting the host from Cloudera Manager according to the above document, any files on Node will not actually be deleted. Those files created in /usr/, /var/, /opt/, etc. still remain on this host.
Next, what files do we need to delete manually?
First of all, we definitely need to delete the software installed by the Cloudera Manager repo.
Of course, if you plan to add this node to another CDP of the same version, you can omit this step.
# yum repolist | grep -Ei cloudera
cloudera-manager Cloudera Manager, Version 7.3.1 6 # yum repo-pkgs cloudera-manager list
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirrors.radwebhosting.com
* epel: mirror.prgmr.com
* extras: mirror.sfo12.us.leaseweb.net
* updates: sjc.edge.kernel.org
Installed Packages
cloudera-manager-agent.x86_64 7.3.1-10891891.el7 -manager
cloudera-manager-daemons.x86_64 7.3.1-10891891.el7 -manager
openjdk8.x86_64 8.0+232_9-cloudera -manager # yum repo-pkgs cloudera-manager remove -y
From the output of the above command, we can find that on my new node, three software have been installed through the Cloudera Manager repo. They are CM Agent, CM daemons and OpenJDK. Use the repo-pkgs remove command to delete these software. Since I plan to add this node to a CDP 7.1.6 cluster of the same version later (by another CM), I will skip this step here.
Update:
I found that these yum packages installed by the cloudera-manager repo still need to be manually deleted. Because when I add the old node to a new cluster, an error occurs during the installation of Cloudera packages. The reason for the error is that when I manually deleted some directories under /usr, /var and /etc, some of these directories were managed by Cloudera packages, so I need to reinstall these packages, and if I don’t delete these packages manually, CM will think that the package on this host does not need to be installed, so the subsequent configuration will fail due to lack of files.
Therefore, whether you plan to add the old host to another cluster of the same version or a different version of the cluster, you need to manually delete the package installed by the cloudera-manager repo first. And the step of deleting package needs to be performed before finding {/usr, /var, /etc...} and deleting related directories.
And, in my environment, "yum repo-pkgs cloudera-manager remove" does not work, so I use a workaround to delete these packages:
clouderaPkgs=`(yum list installed | grep -Ei cloudera | awk '{print $1}')`
for i in ${clouderaPkgs[@]}; do
yum remove -y $i;
done
Now, let us remove the remaining directories created by Cloudera packages:
declare -a dirsSCM
dirsSCM=(`find /var -type d -iname '*cloudera*'`)
for i in ${dirsSCM[@]}; do
echo $i
rm -rf $i
done
dirsSCM=(`find /etc -type d -iname '*cloudera*'`)
for i in ${dirsSCM[@]}; do
echo $i
rm -rf $i
done
dirsSCM=(`find /opt -type d -iname '*cloudera*'`)
for i in ${dirsSCM[@]}; do
echo $i
rm -rf $i
done
Of course, don't forget, it is best to delete the directory /var/run/cloudera-scm-agent, which is used by the CM Agent to manage various role instances (such as DataNode, NodeManger, etc.).
Update: I just found that after a reboot, this directory is gone. It's a tmpfs filesystem.
Then we need to manually clean up the alternatives-related files. This is a bit troublesome, and I have encountered several cases where the customer added the host to a new Cloudera Manager managed cluster and since the old alternatives related files already exist, the new cluster's files will not be propagated correctly.
# ls -AFlh /etc/alternatives | grep -Ei cloudera | awk '{print $9"\t"$NF}' | sed -r 's/\*$//g' > /tmp/alternaives_cloudera_list.txt
# head -n 5 /tmp/alternaives_cloudera_list.txt
avro-tools /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/avro-tools
beeline /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/beeline
bigtop-detect-javahome /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/bigtop-detect-javahome
catalogd /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/catalogd
cdsw /opt/cloudera/parcels/CDSW-1.9.1.p1.10118148/scripts/cdsw
# wc -l /tmp/alternaives_cloudera_list.txt
131 /tmp/alternaives_cloudera_list.txt # filter out the alternatives items generated by Cloudera Manager.
ls -AFlh /etc/alternatives | grep -Ei cloudera | awk '{print $9"\t"$NF}' | sed -r 's/\*$//g' > /tmp/alternaives_cloudera_list.txt
# For example, "alternatives --remove yarn /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/bin/yarn" will delete this item for yarn generated by Cloudera Manager.
# use a loop to delete all the items.
while read line; do
argsArr=($line);
echo -e "${argsArr[0]}...${argsArr[1]}";
alternatives --remove ${argsArr[0]} ${argsArr[1]};
done < /tmp/alternaives_cloudera_list.txt
So far, I should have cleaned up all the files that need to be manually deleted, so I restarted the host.
Then I created a new CDP 7.1.6 cluster, turned on Kerberos and TLS, and tried to add the host just now.
I deployed a CDP PvC Base 7.1.6 cluster with a one-click deployment script, the CM version is 7.3.1, and the new cluster is named C1669.
Therefore, the CDP version and CM version of this new cluster -- c1669 is the same as the version of the old cluster -- c3669.
After deploying the cluster -- c1669, I used the script to enable Kerberos and TLS. Now the KDC server used by the cluster is located on the host c1669-node1, which is also the host where the CM is located.
Regarding TLS I also used the same Auto-TLS case 1 as the c3669 cluster.
Now I'm adding the host c3669-temp-node1.kyanlab.cloudera.com to the newly created c1669 cluster and try to add a NodeManager role to it to see if it can be successfully started.
As a result, as I expected, the host c3669-temp-node1.kyanlab.cloudera.com was successfully added to the cluster c1669, and the newly deployed Node Manager can be successfully started.
Conclusion
If you need to remove a host from an existing CDP/CDH cluster and add it to another CDP/CDH cluster, please follow the steps below:
Refer to this document to remove this node from Cloudera Manager.
Remove the packages which are installed via cloudera-manager repository.
Delete the remained files created by CM Agent and those packages installed by cloudera-manager.
Reboot the host.
Now you can add this host to a new cluster managed by CM.
... View more
Labels: