Created on 10-07-2021 08:19 PM - edited on 10-07-2021 09:53 PM by subratadas
A question about CDSW was asked by a customer.
We connect to external databases in CDSW. We found that some of the client IPs of the database are not in our LAN CIDR, and these IPs do not belong to the Node IP of CDSW. Why is this happening?
CDSW uses Kubernetes as its infrastructure. Kubernetes maintains the IP CIDR of Pod and Service internally.
When the Kubernetes workload communicates with the outside, it needs to pass the Service abstraction layer. The Service layer mainly implements packet forwarding between Node and Pod through NAT.
Service has different modes. There are different ways to deal with the source IP NAT in different modes.
Service may retain the source (Node) IP. This nature may be related to your question.
For details, please refer to the official Kubernetes document: Using Source IP.
In addition, if you want to test what the source IP is when accessing an external database from the Pod in the CDSW Kubernetes cluster, you can refer to my test steps in the experimental environment.
Some practice:
Here I have an external host-hostname.cloudera.com, and PostgresSQL Server is running on this host as follows:
20211007_04:25:21 [root@hostname ~]# hostname hostname.cloudera.com 20211007_04:25:33 [root@hostname ~]# ps aux | sed -rn '1p;/postgres/Ip' | grep -v sed | head -n 2 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 52 0.0 0.0 164160 10560 ? Ss 00:51 0:02 /usr/pgsql-10/bin/postmaster -D /var/lib/pgsql/10/data/ 20211007_04:26:06 [root@hostname ~]# ip -4 -o addr 1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever 8826: eth0 inet 172.xx.xx.xxx/20 scope global eth0\ valid_lft forever preferred_lft forever 20211007_04:26:21 [root@hostname ~]#
Perform the following steps on the CDSW Master host:
docker pull ubuntu
kubectl create deployment ubuntu --dry-run --image=ubuntu:latest -o yaml> /tmp/ubuntu-deployment.yaml
spec:
containers:
- image: ubuntu:latest
args:
- sleep
- "1000000"
name: ubuntu
kubectl apply -f /tmp/ubuntu-deployment.yaml
kubectl -n default exec -it $(kubectl -n default get pod -l "app=ubuntu" -o jsonpath='{.items[0].metadata.name}') -- bash
apt update;
apt install -y iproute2; # This provide the `ip addr` utility.
apt install -y postgresql-client;
psql -h hostname.cloudera.com -U postgres
:light_bulb:Enter the password here
[root@host-10-xx-xx-xx ~]# kubectl -n default get pods -l app=ubuntu -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ubuntu-7c458b9dfd-hr5qg 1/1 Running 0 64m 100.xx.xx.xx host-10-xx-xx-xx <none> <none>
[root@host-10-xx-xx-xx ~]# kubectl -n default exec -it $(kubectl -n default get pod -l "app=ubuntu" -o jsonpath='{.items[0].metadata.name}') -- ip -4 -o addr ➡ 此處執行的"ip -4 -o addr"是在ubuntu Pod中執行的。
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
5169: eth0 inet 100.xx.x.xx/16 brd 100.xx.xx.xx scope global eth0\ valid_lft forever preferred_lft forever
[root@host-10-xx-xx-xx ~]# kubectl -n default exec -it $(kubectl -n default get pod -l "app=ubuntu" -o jsonpath='{.items[0].metadata.name}') -- netstat -anp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 100.xx.xx.xx:43340 172.xx.xx.xx:5xxx ESTABLISHED 1417/psql
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node PID/Program name Path
:light_bulb: Note that the 'netstat' command above is executed inside the Ubuntu Pod. We can see that the 'Local Address' is the IP of the Pod; the 'Foreign Address' is the IP of hostname.[root@host-10-xx-xx-xx ~]# ip -4 -o addr ➡ Note that the hostname here is the same as the NODE in (5.1) above, which is the host node of the ubuntu Pod.
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
2: eth0 inet 10.xx.xx.xx/22 brd 10.xx.xx.xx scope global noprefixroute dynamic eth0\ valid_lft 71352sec preferred_lft 71352sec
5044: docker0 inet 172.xx.xx.xx/16 scope global docker0\ valid_lft forever preferred_lft forever
5047: weave inet 100.xx.xx.xx/16 brd 100.xx.xx.xx scope global weave\ valid_lft forever preferred_lft forever
20211007_04:52:46 [hostname ~]# ps aux | sed -rn '1p;/post/Ip' | grep -v sed | sed -rn '1p;/100\.xx\.xx\.xx|10\.xx\.xx\.xx/Ip'
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 65536 0.0 0.0 164988 4148 ? Ss 03:55 0:00 postgres: postgres postgres 10.xx.xx.xx(43340) idle
20211007_04:53:19 [root@c3669-node1 ~]# ip -4 -o addr
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
8826: eth0 inet 172.xx.xx.xx/20 scope global eth0\ valid_lft forever preferred_lft forever
By default, when a Pod in Kubernetes accesses services outside the cluster, its source IP is the IP of the Node.
In addition, there are some ways to detect the Pod IP CIDR and Service IP CIDR in your CDSW Kubernetes cluster so that you can better understand the information of your CDSW environment infrastructure.
All commands are executed on the CDSW master host.
kubectl cluster-info dump | grep -m 1 cluster-cidr
kubectl cluster-info dump | grep -m 1 service-cluster-ip-range
Hope the above information is helpful to you.