Community Articles

Find and share helpful community-sourced technical articles.
Cloudera Employee

A question about CDSW was asked by a customer.

Question:

We connect to external databases in CDSW. We found that some of the client IPs of the database are not in our LAN CIDR, and these IPs do not belong to the Node IP of CDSW. Why is this happening?

 

Response:

CDSW uses Kubernetes as its infrastructure. Kubernetes maintains the IP CIDR of Pod and Service internally.
When the Kubernetes workload communicates with the outside, it needs to pass the Service abstraction layer. The Service layer mainly implements packet forwarding between Node and Pod through NAT.

Service has different modes. There are different ways to deal with the source IP NAT in different modes.

Service may retain the source (Node) IP. This nature may be related to your question.
For details, please refer to the official Kubernetes document: Using Source IP.

 

In addition, if you want to test what the source IP is when accessing an external database from the Pod in the CDSW Kubernetes cluster, you can refer to my test steps in the experimental environment.

 

Some practice:

Here I have an external host-hostname.cloudera.com, and PostgresSQL Server is running on this host as follows:

20211007_04:25:21 [root@hostname ~]# hostname
hostname.cloudera.com
20211007_04:25:33 [root@hostname ~]# ps aux | sed -rn '1p;/postgres/Ip' | grep -v sed | head -n 2
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres      52  0.0  0.0 164160 10560 ?        Ss   00:51   0:02 /usr/pgsql-10/bin/postmaster -D /var/lib/pgsql/10/data/
20211007_04:26:06 [root@hostname ~]# ip -4 -o addr
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
8826: eth0    inet 172.xx.xx.xxx/20 scope global eth0\       valid_lft forever preferred_lft forever
20211007_04:26:21 [root@hostname ~]#

Perform the following steps on the CDSW Master host:

  1. Download the docker image of Ubuntu:
    docker pull ubuntu
  2. Create a deployment with the Ubuntu image:
    1. Generate the deployment manifest template:
      kubectl create deployment ubuntu --dry-run --image=ubuntu:latest -o yaml> /tmp/ubuntu-deployment.yaml
    2. Modify the deployment manifest template and execute the sleep command after the Ubuntu container is started. Modify the spec part in the /tmp/ubuntu-deployment.yaml file to the following content:
      spec:
        containers:
        - image: ubuntu:latest
          args:
          - sleep
          - "1000000"
          name: ubuntu
    3. Use kubectl to create this Ubuntu deployment:
      kubectl apply -f /tmp/ubuntu-deployment.yaml
  3. Enter this Ubuntu deployment, install Postgresql client and network tools:
    1. Enter Ubuntu Pod:
      kubectl -n default exec -it $(kubectl -n default get pod -l "app=ubuntu" -o jsonpath='{.items[0].metadata.name}') -- bash
    2. Command-line tools required for installation:
      apt update;
      apt install -y iproute2; # This provide the `ip addr` utility.
      apt install -y postgresql-client;
  4. Use psql in Ubuntu Pod to connect to Postgres server on hostname.cloudera.com:
    psql -h hostname.cloudera.com -U postgres
    ‌:light_bulb:‌Enter the password here
  5. Check the IP of the Ubuntu Pod, the IP of the host node where the Ubuntu Pod is located, and check the IP of the client on the Postgres server. They are all executed on the CDSW Master host.
    1. Check the IP of Ubuntu Pod:
      [root@host-10-xx-xx-xx ~]# kubectl -n default get pods -l app=ubuntu -o wide
      NAME                      READY   STATUS    RESTARTS   AGE   IP            NODE                 NOMINATED NODE   READINESS GATES
      ubuntu-7c458b9dfd-hr5qg   1/1     Running   0          64m   100.xx.xx.xx   host-10-xx-xx-xx   <none>           <none>
      
      [root@host-10-xx-xx-xx ~]# kubectl -n default exec -it $(kubectl -n default get pod -l "app=ubuntu" -o jsonpath='{.items[0].metadata.name}') -- ip -4 -o addr ➡ 此處執行的"ip -4 -o addr"是在ubuntu Pod中執行的。
      1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
      5169: eth0    inet 100.xx.x.xx/16 brd 100.xx.xx.xx scope global eth0\       valid_lft forever preferred_lft forever
      
      [root@host-10-xx-xx-xx ~]# kubectl -n default exec -it $(kubectl -n default get pod -l "app=ubuntu" -o jsonpath='{.items[0].metadata.name}') -- netstat -anp
      Active Internet connections (servers and established)
      Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
      tcp        0      0 100.xx.xx.xx:43340       172.xx.xx.xx:5xxx      ESTABLISHED 1417/psql
      Active UNIX domain sockets (servers and established)
      Proto RefCnt Flags       Type       State         I-Node   PID/Program name     Path
      :light_bulb:‌ Note that the 'netstat' command above is executed inside the Ubuntu Pod. We can see that the 'Local Address' is the IP of the Pod; the 'Foreign Address' is the IP of hostname.
    2. Host Node's IP
      [root@host-10-xx-xx-xx ~]# ip -4 -o addr ➡ Note that the hostname here is the same as the NODE in (5.1) above, which is the host node of the ubuntu Pod.
      1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
      2: eth0 inet 10.xx.xx.xx/22 brd 10.xx.xx.xx scope global noprefixroute dynamic eth0\ valid_lft 71352sec preferred_lft 71352sec
      5044: docker0 inet 172.xx.xx.xx/16 scope global docker0\ valid_lft forever preferred_lft forever
      5047: weave inet 100.xx.xx.xx/16 brd 100.xx.xx.xx scope global weave\ valid_lft forever preferred_lft forever
    3. View the client's IP on Postgres server:
    4. 20211007_04:52:46 [hostname ~]# ps aux | sed -rn '1p;/post/Ip' | grep -v sed | sed -rn '1p;/100\.xx\.xx\.xx|10\.xx\.xx\.xx/Ip'
      USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
      postgres   65536  0.0  0.0 164988  4148 ?        Ss   03:55   0:00 postgres: postgres postgres 10.xx.xx.xx(43340) idle
      20211007_04:53:19 [root@c3669-node1 ~]# ip -4 -o addr
      1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
      8826: eth0    inet 172.xx.xx.xx/20 scope global eth0\       valid_lft forever preferred_lft forever

Conclusion

By default, when a Pod in Kubernetes accesses services outside the cluster, its source IP is the IP of the Node.

In addition, there are some ways to detect the Pod IP CIDR and Service IP CIDR in your CDSW Kubernetes cluster so that you can better understand the information of your CDSW environment infrastructure.
All commands are executed on the CDSW master host.

  1. Get Pod IP CIDR
    kubectl cluster-info dump | grep -m 1 cluster-cidr
  2. Service IP CIDR
    kubectl cluster-info dump | grep -m 1 service-cluster-ip-range

Hope the above information is helpful to you.

208 Views
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.
Version history
Last update:
‎10-07-2021 09:53 PM
Updated by:
Contributors
Top Kudoed Authors