Created on 03-29-2018 09:05 AM - edited 09-16-2022 06:02 AM
kubedns and dnsmasq both appear to be failing
sudo /usr/bin/cdsw init
...
Waiting for kube-system cluster to come up. This could take a few minutes...
ERROR:: Unable to bring up kube-system cluster.: 1
ERROR:: Unable to start kubernetes system pods.: 1
...
$ sudo kubectl --namespace=kube-system get pods NAME READY STATUS RESTARTS AGE etcd-udodapp05 1/1 Running 0 16m kube-apiserver-udodapp05 1/1 Running 0 16m kube-controller-manager-udodapp05 1/1 Running 0 16m kube-dns-3911048160-99klb 2/3 CrashLoopBackOff 13 15m kube-proxy-02z9b 1/1 Running 0 15m kube-scheduler-udodapp05 1/1 Running 0 15m weave-net-4fzw6 2/2 Running 0 15m
$ cat cdsw.conf JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera/ MASTER_IP=[redacted] DOMAIN=[redacted] DOCKER_BLOCK_DEVICES=/dev/mapper/imgvg-imglv APPLICATION_BLOCK_DEVICE=/dev/mapper/appvg-applv NO_PROXY="127.0.0.1,localhost,[redacted],100.66.0.1,100.66.0.2,100.66.0.3,100.66.0.4,100.66.0.5,100.66.0.6,100.66.0.7,100.66.0.8,100.66.0.9,100.66.0.10,100.66.0.11,100.66.0.12,100.66.0.13,100.66.0.14,100.66.0.15,100.66.0.16,100.66.0.17,100.66.0.18,100.66.0.19,100.66.0.20,100.66.0.21,100.66.0.22,100.66.0.23,100.66.0.24,100.66.0.25,100.66.0.26,100.66.0.27,100.66.0.28,100.66.0.29,100.66.0.30,100.66.0.31,100.66.0.32,100.66.0.33,100.66.0.34,100.66.0.35,100.66.0.36,100.66.0.37,100.66.0.38,100.66.0.39,100.66.0.40,100.66.0.41,100.66.0.42,100.66.0.43,100.66.0.44,100.66.0.45,100.66.0.46,100.66.0.47,100.66.0.48,100.66.0.49,100.66.0.50,100.77.0.129,100.77.0.130,100.77.0.1,100.77.0.10"
$ sudo kubectl logs -f --since=1h po/kube-dns-3911048160-99klb dnsmasq --namespace=kube-system I0320 22:03:25.264188 1 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000} I0320 22:03:25.265432 1 nanny.go:86] Starting dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] I0320 22:03:25.298956 1 nanny.go:111] I0320 22:03:25.298956 1 nanny.go:108] dnsmasq[25]: started, version 2.78-security-prerelease cachesize 1000 W0320 22:03:25.299025 1 nanny.go:112] Got EOF from stdout I0320 22:03:25.299031 1 nanny.go:108] dnsmasq[25]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify I0320 22:03:25.299044 1 nanny.go:108] dnsmasq[25]: using nameserver 127.0.0.1#10053 for domain ip6.arpa I0320 22:03:25.299052 1 nanny.go:108] dnsmasq[25]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa I0320 22:03:25.299055 1 nanny.go:108] dnsmasq[25]: using nameserver 127.0.0.1#10053 for domain cluster.local I0320 22:03:25.299065 1 nanny.go:108] dnsmasq[25]: reading /etc/resolv.conf I0320 22:03:25.299068 1 nanny.go:108] dnsmasq[25]: using nameserver 127.0.0.1#10053 for domain ip6.arpa I0320 22:03:25.299072 1 nanny.go:108] dnsmasq[25]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa I0320 22:03:25.299076 1 nanny.go:108] dnsmasq[25]: using nameserver 127.0.0.1#10053 for domain cluster.local I0320 22:03:25.299079 1 nanny.go:108] dnsmasq[25]: using nameserver [redacted]#53 I0320 22:03:25.299082 1 nanny.go:108] dnsmasq[25]: using nameserver [redacted]#53 I0320 22:03:25.299085 1 nanny.go:108] dnsmasq[25]: using nameserver [redacted]#53 I0320 22:03:25.299089 1 nanny.go:108] dnsmasq[25]: using nameserver [redacted]#53 I0320 22:03:25.299092 1 nanny.go:108] dnsmasq[25]: read /etc/hosts - 7 addresses
$ sudo kubectl logs -f --since=1h po/kube-dns-3911048160-99klb kubedns --namespace=kube-system I0320 21:58:22.617903 1 dns.go:48] version: 1.14.4-2-g5584e04 I0320 21:58:22.619053 1 server.go:70] Using configuration read from directory: /kube-dns-config with period 10s I0320 21:58:22.619096 1 server.go:113] FLAG: --alsologtostderr="false" I0320 21:58:22.619108 1 server.go:113] FLAG: --config-dir="/kube-dns-config" I0320 21:58:22.619114 1 server.go:113] FLAG: --config-map="" I0320 21:58:22.619118 1 server.go:113] FLAG: --config-map-namespace="kube-system" I0320 21:58:22.619121 1 server.go:113] FLAG: --config-period="10s" I0320 21:58:22.619129 1 server.go:113] FLAG: --dns-bind-address="0.0.0.0" I0320 21:58:22.619132 1 server.go:113] FLAG: --dns-port="10053" I0320 21:58:22.619137 1 server.go:113] FLAG: --domain="cluster.local." I0320 21:58:22.619142 1 server.go:113] FLAG: --federations="" I0320 21:58:22.619148 1 server.go:113] FLAG: --healthz-port="8081" I0320 21:58:22.619151 1 server.go:113] FLAG: --initial-sync-timeout="1m0s" I0320 21:58:22.619155 1 server.go:113] FLAG: --kube-master-url="" I0320 21:58:22.619162 1 server.go:113] FLAG: --kubecfg-file="" I0320 21:58:22.619165 1 server.go:113] FLAG: --log-backtrace-at=":0" I0320 21:58:22.619171 1 server.go:113] FLAG: --log-dir="" I0320 21:58:22.619175 1 server.go:113] FLAG: --log-flush-frequency="5s" I0320 21:58:22.619180 1 server.go:113] FLAG: --logtostderr="true" I0320 21:58:22.619183 1 server.go:113] FLAG: --nameservers="" I0320 21:58:22.619186 1 server.go:113] FLAG: --stderrthreshold="2" I0320 21:58:22.619189 1 server.go:113] FLAG: --v="2" I0320 21:58:22.619192 1 server.go:113] FLAG: --version="false" I0320 21:58:22.619202 1 server.go:113] FLAG: --vmodule="" I0320 21:58:22.619292 1 server.go:176] Starting SkyDNS server (0.0.0.0:10053) I0320 21:58:22.619587 1 server.go:198] Skydns metrics enabled (/metrics:10055) I0320 21:58:22.619599 1 dns.go:147] Starting endpointsController I0320 21:58:22.619603 1 dns.go:150] Starting serviceController I0320 21:58:22.619713 1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0] I0320 21:58:22.619737 1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0] I0320 21:58:23.119838 1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver... I0320 21:58:23.619844 1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver... E0320 21:58:23.623059 1 reflector.go:199] k8s.io/dns/vendor/k8s.io/client-go/tools/cache/reflector.go:94: Failed to list *v1.Endpoints: Get https://100.77.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 100.77.0.1:443: getsockopt: connection refused E0320 21:58:23.623077 1 reflector.go:199] k8s.io/dns/vendor/k8s.io/client-go/tools/cache/reflector.go:94: Failed to list *v1.Service: Get https://100.77.0.1:443/api/v1/services?resourceVersion=0: dial tcp 100.77.0.1:443: getsockopt: connection refused I0320 21:58:24.119875 1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver... I0320 21:58:24.619805 1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver... I0320 21:58:25.119883 1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver... I0320 21:58:25.619870 1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver.. .............. I0320 21:59:22.119836 1 dns.go:174] Waiting for services and endpoints to be initialized from apiserver... F0320 21:59:22.619832 1 dns.go:168] Timeout waiting for initialization
Created 04-10-2018 10:38 AM
I now have CDSW up and running.
I'm not sure which one of these did the trick or if there was some other force at play.
We found a bug in ip6tables.service (RHEL 7.4) that was producing error messages like this:
Apr 10 10:06:56 [redacted] systemd[1]: [/usr/lib/systemd/system/ip6tables.service:3] Failed to add dependency on syslog.target,iptables.service, ignoring: Invalid argument
so we changed the After parameter from comma delimited to space delimited.
before change:
After=syslog.target,iptables.service
after change:
After=syslog.target iptables.service
Bug link:
https://bugzilla.redhat.com/show_bug.cgi?id=1499367
Here are the commands that were run
edit /usr/lib/systemd/system/ip6tables.service
systemctl stop iptables
systemctl disable iptables
systemctl stop ip6tables
systemctl disable ip6tables
/usr/bin/cdsw reset
/usr/bin/cdsw init
Created 04-09-2018 01:21 PM
All,
I'm still facing the same issue.
If any of you have the kube-dns pod running with all 3 containers running successfully (kubedns,dnsmasq and sidecar), can you run the following and reply back with the output...it would be greatly appreciated.
Get the pod names from the output of this command
kubectl get pods --all-namespaces
then get the CLUSTER-IP from this command
kubectl get services --sort-by=.metadata.name
then execute nslookup commands on the running pods
e.g.
kubectl exec <kube-dns-pod-name> -c sidecar --namespace=kube-system -- nslookup <CLUSTER-IP> kubectl exec <kube-dns-pod-name> -c dnsmasq --namespace=kube-system -- nslookup <CLUSTER-IP> kubectl exec <kube-dns-pod-name> -c kubedns --namespace=kube-system -- nslookup <CLUSTER-IP> e.g. kubectl exec kube-dns-3911048160-lhtvm -c kubedns --namespace=kube-system -- nslookup 100.77.0.1
I may be barking up the wrong tree, but I'm trying to figure out why my containers timeout when trying to connect to https://100.77.0.1:443
Also, if you could post a copy of your /etc/cdsw/config/cdsw.conf (with sensitive information redacted or masked) that would be great.
Created 04-10-2018 10:38 AM
I now have CDSW up and running.
I'm not sure which one of these did the trick or if there was some other force at play.
We found a bug in ip6tables.service (RHEL 7.4) that was producing error messages like this:
Apr 10 10:06:56 [redacted] systemd[1]: [/usr/lib/systemd/system/ip6tables.service:3] Failed to add dependency on syslog.target,iptables.service, ignoring: Invalid argument
so we changed the After parameter from comma delimited to space delimited.
before change:
After=syslog.target,iptables.service
after change:
After=syslog.target iptables.service
Bug link:
https://bugzilla.redhat.com/show_bug.cgi?id=1499367
Here are the commands that were run
edit /usr/lib/systemd/system/ip6tables.service
systemctl stop iptables
systemctl disable iptables
systemctl stop ip6tables
systemctl disable ip6tables
/usr/bin/cdsw reset
/usr/bin/cdsw init
Created 07-24-2019 04:47 AM
Hi,
We are facing same kind of issue..are you able to resolve?
Please find below logs for reference.
cdsw status
Sending detailed logs to [/tmp/cdsw_status_HOe8Jj.log] ...
CDSW Version: [1.5.0.849870:4b1d6ac]
OK: Application running as root check
OK: NFS service check
OK: System process check for CSD install
OK: Sysctl params check
OK: Kernel memory slabs check
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| NAME | STATUS | CREATED-AT | VERSION | EXTERNAL-IP | OS-IMAGE | KERNEL-VERSION | GPU | STATEFUL |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| dvwuaspnhad03.ams.com | True | 2019-07-23 15:22:18+00:00 | v1.8.12-1+44f60fa9b27304-dirty | None | Red Hat Enterprise Linux | 3.10.0-693.2.2.el7.x86_64 | 0 | True |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1/1 nodes are ready.
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| NAME | READY | STATUS | RESTARTS | CREATED-AT | POD-IP | HOST-IP | ROLE |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| etcd-dvwuaspnhad03.ams.com | 1/1 | Running | 0 | 2019-07-23 15:23:22+00:00 | 159.127.45.148 | 159.127.45.148 | None |
| kube-apiserver-dvwuaspnhad03.ams.com | 1/1 | Running | 0 | 2019-07-23 15:23:39+00:00 | 159.127.45.148 | 159.127.45.148 | None |
| kube-controller-manager-dvwuaspnhad03.ams.com | 1/1 | Running | 0 | 2019-07-23 15:23:37+00:00 | 159.127.45.148 | 159.127.45.148 | None |
| kube-dns-78dcf4b9d9-4qlmt | 3/3 | Running | 0 | 2019-07-23 15:23:49+00:00 | 100.66.0.4 | 159.127.45.148 | None |
| kube-proxy-72npf | 1/1 | Running | 0 | 2019-07-23 15:23:52+00:00 | 159.127.45.148 | 159.127.45.148 | None |
| kube-scheduler-dvwuaspnhad03.ams.com | 1/1 | Running | 0 | 2019-07-23 15:23:30+00:00 | 159.127.45.148 | 159.127.45.148 | None |
| tiller-deploy-775556c68-ntgxs | 1/1 | Running | 0 | 2019-07-23 15:22:36+00:00 | 100.66.0.2 | 159.127.45.148 | None |
| weave-net-6w4cc | 2/2 | Running | 1 | 2019-07-23 15:22:36+00:00 | 159.127.45.148 | 159.127.45.148 | None |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
All required pods are ready in cluster kube-system.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| NAME | READY | STATUS | RESTARTS | CREATED-AT | POD-IP | HOST-IP | ROLE |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| cron-5df865cd67-8v9gq | 1/1 | Running | 0 | 2019-07-23 15:24:07+00:00 | 100.66.0.5 | 159.127.45.148 | cron |
| db-586cf7d4b6-kgrgs | 1/1 | Running | 0 | 2019-07-23 15:24:07+00:00 | 100.66.0.8 | 159.127.45.148 | db |
| db-migrate-4b1d6ac-757lc | 0/1 | Succeeded | 0 | 2019-07-23 15:24:07+00:00 | 100.66.0.6 | 159.127.45.148 | db-migrate |
| ds-cdh-client-b948b4b8b-qvltp | 1/1 | Running | 0 | 2019-07-23 15:24:09+00:00 | 100.66.0.19 | 159.127.45.148 | ds-cdh-client |
| ds-operator-84d49b8786-mvssl | 2/2 | Running | 2 | 2019-07-23 15:24:09+00:00 | 100.66.0.13 | 159.127.45.148 | ds-operator |
| ds-vfs-7c85df495f-2xbcj | 1/1 | Running | 0 | 2019-07-23 15:24:09+00:00 | 100.66.0.21 | 159.127.45.148 | ds-vfs |
| ingress-controller-ff89786db-cbmpj | 0/1 | CrashLoopBackOff | 243 | 2019-07-23 15:24:07+00:00 | 159.127.45.148 | 159.127.45.148 | ingress-controller |
| livelog-66f5b7986c-ctzsp | 1/1 | Running | 0 | 2019-07-23 15:24:07+00:00 | 100.66.0.7 | 159.127.45.148 | livelog |
| s2i-builder-5b7c868b6d-4lslx | 1/1 | Running | 2 | 2019-07-23 15:24:09+00:00 | 100.66.0.22 | 159.127.45.148 | s2i-builder |
| s2i-builder-5b7c868b6d-m8r28 | 1/1 | Running | 2 | 2019-07-23 15:24:10+00:00 | 100.66.0.18 | 159.127.45.148 | s2i-builder |
| s2i-builder-5b7c868b6d-t56q2 | 1/1 | Running | 2 | 2019-07-23 15:24:09+00:00 | 100.66.0.23 | 159.127.45.148 | s2i-builder |
| s2i-client-77d575bcc8-s98nf | 1/1 | Running | 0 | 2019-07-23 15:24:09+00:00 | 100.66.0.20 | 159.127.45.148 | s2i-client |
| s2i-git-server-7855bcbcc5-prmgc | 1/1 | Running | 0 | 2019-07-23 15:24:07+00:00 | 100.66.0.9 | 159.127.45.148 | s2i-git-server |
| s2i-queue-76fc7f5f88-jwrwf | 1/1 | Running | 0 | 2019-07-23 15:24:07+00:00 | 100.66.0.3 | 159.127.45.148 | s2i-queue |
| s2i-registry-74496d54dc-jkjp4 | 1/1 | Running | 0 | 2019-07-23 15:24:07+00:00 | 100.66.0.15 | 159.127.45.148 | s2i-registry |
| s2i-registry-auth-6f6f658947-8dgp9 | 1/1 | Running | 0 | 2019-07-23 15:24:07+00:00 | 100.66.0.11 | 159.127.45.148 | s2i-registry-auth |
| s2i-server-5b778bcb8d-n92rk | 1/1 | Running | 2 | 2019-07-23 15:24:08+00:00 | 100.66.0.12 | 159.127.45.148 | s2i-server |
| secret-generator-77d7b98444-wwjgt | 1/1 | Running | 0 | 2019-07-23 15:24:08+00:00 | 100.66.0.10 | 159.127.45.148 | secret-generator |
| spark-port-forwarder-q6r9t | 1/1 | Running | 0 | 2019-07-23 15:24:09+00:00 | 159.127.45.148 | 159.127.45.148 | spark-port-forwarder |
| web-75bbb7d4ff-6ngdl | 1/1 | Running | 0 | 2019-07-23 15:24:08+00:00 | 100.66.0.17 | 159.127.45.148 | web |
| web-75bbb7d4ff-g7hf9 | 1/1 | Running | 0 | 2019-07-23 15:24:08+00:00 | 100.66.0.14 | 159.127.45.148 | web |
| web-75bbb7d4ff-jtf8b | 1/1 | Running | 0 | 2019-07-23 15:24:08+00:00 | 100.66.0.16 | 159.127.45.148 | web |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Pods not ready in cluster default ['role/ingress-controller'].
All required Application services are configured.
All required secrets are available.
Persistent volumes are ready.
Persistent volume claims are ready.
Ingresses are ready.
Checking web at url: http://cdsw.ams.com
Web is not yet up.
Cloudera Data Science Workbench is not ready yet