Support Questions

Find answers, ask questions, and share your expertise

cdsw init fails with kube-dns issues

avatar
Contributor

 

 

kubedns and dnsmasq both appear to be failing

 

sudo /usr/bin/cdsw init

...

Waiting for kube-system cluster to come up. This could take a few minutes...
ERROR:: Unable to bring up kube-system cluster.: 1

ERROR:: Unable to start kubernetes system pods.: 1

...

 

 

$ sudo kubectl --namespace=kube-system get pods
NAME                                READY     STATUS             RESTARTS   AGE
etcd-udodapp05                      1/1       Running            0          16m
kube-apiserver-udodapp05            1/1       Running            0          16m
kube-controller-manager-udodapp05   1/1       Running            0          16m
kube-dns-3911048160-99klb           2/3       CrashLoopBackOff   13         15m
kube-proxy-02z9b                    1/1       Running            0          15m
kube-scheduler-udodapp05            1/1       Running            0          15m
weave-net-4fzw6                     2/2       Running            0          15m

 

$ cat cdsw.conf
JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera/
MASTER_IP=[redacted]
DOMAIN=[redacted]
DOCKER_BLOCK_DEVICES=/dev/mapper/imgvg-imglv
APPLICATION_BLOCK_DEVICE=/dev/mapper/appvg-applv
NO_PROXY="127.0.0.1,localhost,[redacted],100.66.0.1,100.66.0.2,100.66.0.3,100.66.0.4,100.66.0.5,100.66.0.6,100.66.0.7,100.66.0.8,100.66.0.9,100.66.0.10,100.66.0.11,100.66.0.12,100.66.0.13,100.66.0.14,100.66.0.15,100.66.0.16,100.66.0.17,100.66.0.18,100.66.0.19,100.66.0.20,100.66.0.21,100.66.0.22,100.66.0.23,100.66.0.24,100.66.0.25,100.66.0.26,100.66.0.27,100.66.0.28,100.66.0.29,100.66.0.30,100.66.0.31,100.66.0.32,100.66.0.33,100.66.0.34,100.66.0.35,100.66.0.36,100.66.0.37,100.66.0.38,100.66.0.39,100.66.0.40,100.66.0.41,100.66.0.42,100.66.0.43,100.66.0.44,100.66.0.45,100.66.0.46,100.66.0.47,100.66.0.48,100.66.0.49,100.66.0.50,100.77.0.129,100.77.0.130,100.77.0.1,100.77.0.10"

 

 

$ sudo kubectl logs -f --since=1h po/kube-dns-3911048160-99klb dnsmasq --namespace=kube-system
I0320 22:03:25.264188       1 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000}
I0320 22:03:25.265432       1 nanny.go:86] Starting dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053]
I0320 22:03:25.298956       1 nanny.go:111]
I0320 22:03:25.298956       1 nanny.go:108] dnsmasq[25]: started, version 2.78-security-prerelease cachesize 1000
W0320 22:03:25.299025       1 nanny.go:112] Got EOF from stdout
I0320 22:03:25.299031       1 nanny.go:108] dnsmasq[25]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0320 22:03:25.299044       1 nanny.go:108] dnsmasq[25]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0320 22:03:25.299052       1 nanny.go:108] dnsmasq[25]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0320 22:03:25.299055       1 nanny.go:108] dnsmasq[25]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0320 22:03:25.299065       1 nanny.go:108] dnsmasq[25]: reading /etc/resolv.conf
I0320 22:03:25.299068       1 nanny.go:108] dnsmasq[25]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0320 22:03:25.299072       1 nanny.go:108] dnsmasq[25]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0320 22:03:25.299076       1 nanny.go:108] dnsmasq[25]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0320 22:03:25.299079       1 nanny.go:108] dnsmasq[25]: using nameserver [redacted]#53
I0320 22:03:25.299082       1 nanny.go:108] dnsmasq[25]: using nameserver [redacted]#53
I0320 22:03:25.299085       1 nanny.go:108] dnsmasq[25]: using nameserver [redacted]#53
I0320 22:03:25.299089       1 nanny.go:108] dnsmasq[25]: using nameserver [redacted]#53
I0320 22:03:25.299092       1 nanny.go:108] dnsmasq[25]: read /etc/hosts - 7 addresses
$ sudo kubectl logs -f --since=1h po/kube-dns-3911048160-99klb kubedns --namespace=kube-system
I0320 21:58:22.617903       1 dns.go:48] version: 1.14.4-2-g5584e04
I0320 21:58:22.619053       1 server.go:70] Using configuration read from directory: /kube-dns-config with period 10s
I0320 21:58:22.619096       1 server.go:113] FLAG: --alsologtostderr="false"
I0320 21:58:22.619108       1 server.go:113] FLAG: --config-dir="/kube-dns-config"
I0320 21:58:22.619114       1 server.go:113] FLAG: --config-map=""
I0320 21:58:22.619118       1 server.go:113] FLAG: --config-map-namespace="kube-system"
I0320 21:58:22.619121       1 server.go:113] FLAG: --config-period="10s"
I0320 21:58:22.619129       1 server.go:113] FLAG: --dns-bind-address="0.0.0.0"
I0320 21:58:22.619132       1 server.go:113] FLAG: --dns-port="10053"
I0320 21:58:22.619137       1 server.go:113] FLAG: --domain="cluster.local."
I0320 21:58:22.619142       1 server.go:113] FLAG: --federations=""
I0320 21:58:22.619148       1 server.go:113] FLAG: --healthz-port="8081"
I0320 21:58:22.619151       1 server.go:113] FLAG: --initial-sync-timeout="1m0s"
I0320 21:58:22.619155       1 server.go:113] FLAG: --kube-master-url=""
I0320 21:58:22.619162       1 server.go:113] FLAG: --kubecfg-file=""
I0320 21:58:22.619165       1 server.go:113] FLAG: --log-backtrace-at=":0"
I0320 21:58:22.619171       1 server.go:113] FLAG: --log-dir=""
I0320 21:58:22.619175       1 server.go:113] FLAG: --log-flush-frequency="5s"
I0320 21:58:22.619180       1 server.go:113] FLAG: --logtostderr="true"
I0320 21:58:22.619183       1 server.go:113] FLAG: --nameservers=""
I0320 21:58:22.619186       1 server.go:113] FLAG: --stderrthreshold="2"
I0320 21:58:22.619189       1 server.go:113] FLAG: --v="2"
I0320 21:58:22.619192       1 server.go:113] FLAG: --version="false"
I0320 21:58:22.619202       1 server.go:113] FLAG: --vmodule=""
I0320 21:58:22.619292       1 server.go:176] Starting SkyDNS server (0.0.0.0:10053)
I0320 21:58:22.619587       1 server.go:198] Skydns metrics enabled (/metrics:10055)
I0320 21:58:22.619599       1 dns.go:147] Starting endpointsController
I0320 21:58:22.619603       1 dns.go:150] Starting serviceController
I0320 21:58:22.619713       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0320 21:58:22.619737       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0320 21:58:23.119838       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0320 21:58:23.619844       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
E0320 21:58:23.623059       1 reflector.go:199] k8s.io/dns/vendor/k8s.io/client-go/tools/cache/reflector.go:94: Failed to list *v1.Endpoints: Get https://100.77.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 100.77.0.1:443: getsockopt: connection refused
E0320 21:58:23.623077       1 reflector.go:199] k8s.io/dns/vendor/k8s.io/client-go/tools/cache/reflector.go:94: Failed to list *v1.Service: Get https://100.77.0.1:443/api/v1/services?resourceVersion=0: dial tcp 100.77.0.1:443: getsockopt: connection refused
I0320 21:58:24.119875       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0320 21:58:24.619805       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0320 21:58:25.119883       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
I0320 21:58:25.619870       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver..
..............
I0320 21:59:22.119836       1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver...
F0320 21:59:22.619832       1 dns.go:168] Timeout waiting for initialization

 

 

1 ACCEPTED SOLUTION

avatar
Contributor

I now have CDSW up and running.

I'm not sure which one of these did the trick or if there was some other force at play.

 

We found a bug in ip6tables.service (RHEL 7.4) that was producing error messages like this:

Apr 10 10:06:56 [redacted] systemd[1]: [/usr/lib/systemd/system/ip6tables.service:3] Failed to add dependency on syslog.target,iptables.service, ignoring: Invalid argument

 

so we changed the After parameter from comma delimited to space delimited.

before change:

After=syslog.target,iptables.service

after change:

After=syslog.target iptables.service

 

Bug link:

https://bugzilla.redhat.com/show_bug.cgi?id=1499367

 

Here are the commands that were run

edit /usr/lib/systemd/system/ip6tables.service
systemctl stop iptables
systemctl disable iptables
systemctl stop ip6tables
systemctl disable ip6tables

/usr/bin/cdsw reset

/usr/bin/cdsw init

View solution in original post

3 REPLIES 3

avatar
Contributor

All,

I'm still facing the same issue.

If any of you have the kube-dns pod running with all 3 containers running successfully (kubedns,dnsmasq and sidecar), can you run the following and reply back with the output...it would be greatly appreciated.

Get the pod names from the output of this command

 

kubectl get pods --all-namespaces

then get the CLUSTER-IP from this command

 

 

kubectl get services --sort-by=.metadata.name

then execute nslookup commands on the running pods

 

e.g.

 

kubectl exec <kube-dns-pod-name> -c sidecar --namespace=kube-system -- nslookup <CLUSTER-IP>
kubectl exec <kube-dns-pod-name> -c dnsmasq --namespace=kube-system -- nslookup <CLUSTER-IP>
kubectl exec <kube-dns-pod-name> -c kubedns --namespace=kube-system -- nslookup <CLUSTER-IP>

e.g.
kubectl exec kube-dns-3911048160-lhtvm -c kubedns --namespace=kube-system -- nslookup 100.77.0.1

 

I may be barking up the wrong tree, but I'm trying to figure out why my containers timeout when trying to connect to https://100.77.0.1:443

 

 

Also, if you could post a copy of your /etc/cdsw/config/cdsw.conf (with sensitive information redacted or masked) that would be great.

 

avatar
Contributor

I now have CDSW up and running.

I'm not sure which one of these did the trick or if there was some other force at play.

 

We found a bug in ip6tables.service (RHEL 7.4) that was producing error messages like this:

Apr 10 10:06:56 [redacted] systemd[1]: [/usr/lib/systemd/system/ip6tables.service:3] Failed to add dependency on syslog.target,iptables.service, ignoring: Invalid argument

 

so we changed the After parameter from comma delimited to space delimited.

before change:

After=syslog.target,iptables.service

after change:

After=syslog.target iptables.service

 

Bug link:

https://bugzilla.redhat.com/show_bug.cgi?id=1499367

 

Here are the commands that were run

edit /usr/lib/systemd/system/ip6tables.service
systemctl stop iptables
systemctl disable iptables
systemctl stop ip6tables
systemctl disable ip6tables

/usr/bin/cdsw reset

/usr/bin/cdsw init

avatar
New Contributor

Hi,

 

We are facing same kind of issue..are you able to resolve?

Please find below logs for reference.

 

cdsw status
Sending detailed logs to [/tmp/cdsw_status_HOe8Jj.log] ...
CDSW Version: [1.5.0.849870:4b1d6ac]
OK: Application running as root check
OK: NFS service check
OK: System process check for CSD install
OK: Sysctl params check
OK: Kernel memory slabs check
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| NAME | STATUS | CREATED-AT | VERSION | EXTERNAL-IP | OS-IMAGE | KERNEL-VERSION | GPU | STATEFUL |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| dvwuaspnhad03.ams.com | True | 2019-07-23 15:22:18+00:00 | v1.8.12-1+44f60fa9b27304-dirty | None | Red Hat Enterprise Linux | 3.10.0-693.2.2.el7.x86_64 | 0 | True |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1/1 nodes are ready.
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| NAME | READY | STATUS | RESTARTS | CREATED-AT | POD-IP | HOST-IP | ROLE |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| etcd-dvwuaspnhad03.ams.com | 1/1 | Running | 0 | 2019-07-23 15:23:22+00:00 | 159.127.45.148 | 159.127.45.148 | None |
| kube-apiserver-dvwuaspnhad03.ams.com | 1/1 | Running | 0 | 2019-07-23 15:23:39+00:00 | 159.127.45.148 | 159.127.45.148 | None |
| kube-controller-manager-dvwuaspnhad03.ams.com | 1/1 | Running | 0 | 2019-07-23 15:23:37+00:00 | 159.127.45.148 | 159.127.45.148 | None |
| kube-dns-78dcf4b9d9-4qlmt | 3/3 | Running | 0 | 2019-07-23 15:23:49+00:00 | 100.66.0.4 | 159.127.45.148 | None |
| kube-proxy-72npf | 1/1 | Running | 0 | 2019-07-23 15:23:52+00:00 | 159.127.45.148 | 159.127.45.148 | None |
| kube-scheduler-dvwuaspnhad03.ams.com | 1/1 | Running | 0 | 2019-07-23 15:23:30+00:00 | 159.127.45.148 | 159.127.45.148 | None |
| tiller-deploy-775556c68-ntgxs | 1/1 | Running | 0 | 2019-07-23 15:22:36+00:00 | 100.66.0.2 | 159.127.45.148 | None |
| weave-net-6w4cc | 2/2 | Running | 1 | 2019-07-23 15:22:36+00:00 | 159.127.45.148 | 159.127.45.148 | None |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
All required pods are ready in cluster kube-system.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| NAME | READY | STATUS | RESTARTS | CREATED-AT | POD-IP | HOST-IP | ROLE |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| cron-5df865cd67-8v9gq | 1/1 | Running | 0 | 2019-07-23 15:24:07+00:00 | 100.66.0.5 | 159.127.45.148 | cron |
| db-586cf7d4b6-kgrgs | 1/1 | Running | 0 | 2019-07-23 15:24:07+00:00 | 100.66.0.8 | 159.127.45.148 | db |
| db-migrate-4b1d6ac-757lc | 0/1 | Succeeded | 0 | 2019-07-23 15:24:07+00:00 | 100.66.0.6 | 159.127.45.148 | db-migrate |
| ds-cdh-client-b948b4b8b-qvltp | 1/1 | Running | 0 | 2019-07-23 15:24:09+00:00 | 100.66.0.19 | 159.127.45.148 | ds-cdh-client |
| ds-operator-84d49b8786-mvssl | 2/2 | Running | 2 | 2019-07-23 15:24:09+00:00 | 100.66.0.13 | 159.127.45.148 | ds-operator |
| ds-vfs-7c85df495f-2xbcj | 1/1 | Running | 0 | 2019-07-23 15:24:09+00:00 | 100.66.0.21 | 159.127.45.148 | ds-vfs |
| ingress-controller-ff89786db-cbmpj | 0/1 | CrashLoopBackOff | 243 | 2019-07-23 15:24:07+00:00 | 159.127.45.148 | 159.127.45.148 | ingress-controller |
| livelog-66f5b7986c-ctzsp | 1/1 | Running | 0 | 2019-07-23 15:24:07+00:00 | 100.66.0.7 | 159.127.45.148 | livelog |
| s2i-builder-5b7c868b6d-4lslx | 1/1 | Running | 2 | 2019-07-23 15:24:09+00:00 | 100.66.0.22 | 159.127.45.148 | s2i-builder |
| s2i-builder-5b7c868b6d-m8r28 | 1/1 | Running | 2 | 2019-07-23 15:24:10+00:00 | 100.66.0.18 | 159.127.45.148 | s2i-builder |
| s2i-builder-5b7c868b6d-t56q2 | 1/1 | Running | 2 | 2019-07-23 15:24:09+00:00 | 100.66.0.23 | 159.127.45.148 | s2i-builder |
| s2i-client-77d575bcc8-s98nf | 1/1 | Running | 0 | 2019-07-23 15:24:09+00:00 | 100.66.0.20 | 159.127.45.148 | s2i-client |
| s2i-git-server-7855bcbcc5-prmgc | 1/1 | Running | 0 | 2019-07-23 15:24:07+00:00 | 100.66.0.9 | 159.127.45.148 | s2i-git-server |
| s2i-queue-76fc7f5f88-jwrwf | 1/1 | Running | 0 | 2019-07-23 15:24:07+00:00 | 100.66.0.3 | 159.127.45.148 | s2i-queue |
| s2i-registry-74496d54dc-jkjp4 | 1/1 | Running | 0 | 2019-07-23 15:24:07+00:00 | 100.66.0.15 | 159.127.45.148 | s2i-registry |
| s2i-registry-auth-6f6f658947-8dgp9 | 1/1 | Running | 0 | 2019-07-23 15:24:07+00:00 | 100.66.0.11 | 159.127.45.148 | s2i-registry-auth |
| s2i-server-5b778bcb8d-n92rk | 1/1 | Running | 2 | 2019-07-23 15:24:08+00:00 | 100.66.0.12 | 159.127.45.148 | s2i-server |
| secret-generator-77d7b98444-wwjgt | 1/1 | Running | 0 | 2019-07-23 15:24:08+00:00 | 100.66.0.10 | 159.127.45.148 | secret-generator |
| spark-port-forwarder-q6r9t | 1/1 | Running | 0 | 2019-07-23 15:24:09+00:00 | 159.127.45.148 | 159.127.45.148 | spark-port-forwarder |
| web-75bbb7d4ff-6ngdl | 1/1 | Running | 0 | 2019-07-23 15:24:08+00:00 | 100.66.0.17 | 159.127.45.148 | web |
| web-75bbb7d4ff-g7hf9 | 1/1 | Running | 0 | 2019-07-23 15:24:08+00:00 | 100.66.0.14 | 159.127.45.148 | web |
| web-75bbb7d4ff-jtf8b | 1/1 | Running | 0 | 2019-07-23 15:24:08+00:00 | 100.66.0.16 | 159.127.45.148 | web |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Pods not ready in cluster default ['role/ingress-controller'].
All required Application services are configured.
All required secrets are available.
Persistent volumes are ready.
Persistent volume claims are ready.
Ingresses are ready.
Checking web at url: http://cdsw.ams.com
Web is not yet up.
Cloudera Data Science Workbench is not ready yet