Created on 07-06-2017 07:46 AM - edited 09-16-2022 04:53 AM
Hello,
New to Cloudera, I'm deploying the CDSW in Azure (on a Cloudera Centos7.2 template).
The installation went ok, the Init started well, but eventually not all the pod will start:
Cloudera Data Science Workbench Status Service Status docker: active kubelet: active nfs: active Checking kernel parameters... Node Status NAME STATUS AGE STATEFUL workbench Ready 2h true System Pod status NAME READY STATUS RESTARTS AGE IP NODE dummy-2088944543-uev12 1/1 Running 0 2h 10.0.0.5 workbench etcd-workbench 1/1 Running 0 2h 10.0.0.5 workbench kube-apiserver-workbench 1/1 Running 0 2h 10.0.0.5 workbench kube-controller-manager-workbench 1/1 Running 2 2h 10.0.0.5 workbench kube-discovery-1150918428-v7vu8 1/1 Running 0 2h 10.0.0.5 workbench kube-dns-3873593988-vos07 3/3 Running 0 2h 100.66.0.2 workbench kube-proxy-7qq63 1/1 Running 0 2h 10.0.0.5 workbench kube-scheduler-workbench 1/1 Running 2 2h 10.0.0.5 workbench node-problem-detector-v0.1-kngbh 1/1 Running 0 2h 10.0.0.5 workbench weave-net-clu7s 2/2 Running 0 2h 10.0.0.5 workbench Cloudera Data Science Workbench Pod Status NAME READY STATUS RESTARTS AGE IP NODE ROLE cron-2934152315-56p1n 1/1 Running 0 2h 100.66.0.8 workbench cron db-39862959-icvq9 1/1 Running 1 2h 100.66.0.5 workbench db db-migrate-052787a-mvb40 0/1 ImagePullBackOff 0 2h 100.66.0.4 workbench db-migrate engine-deps-du8cx 1/1 Running 0 2h 100.66.0.3 workbench engine-deps ingress-controller-3138093376-l5z46 1/1 Running 0 2h 10.0.0.5 workbench ingress-controller livelog-1900214889-qppq2 1/1 Running 0 2h 100.66.0.6 workbench livelog reconciler-459456250-wgems 1/1 Running 0 2h 100.66.0.7 workbench reconciler spark-port-forwarder-a31as 1/1 Running 0 2h 10.0.0.5 workbench spark-port-forwarder web-3826671331-7xchm 0/1 ContainerCreating 0 2h <none> workbench web web-3826671331-h3gkd 0/1 ContainerCreating 0 2h <none> workbench web web-3826671331-vtbdh 0/1 ContainerCreating 0 2h <none> workbench web Cloudera Data Science Workbench is not ready yet: some application pods are not ready
$sudo journalctl -u docker Jul 06 13:42:03 workbench docker[6669]: time="2017-07-06T13:42:03.996814534Z" level=error msg="Handler for GET /images/docker.repository.cloudera.com/cdsw/1.0.1/web:052787a/json returned error: No such image: docker.repository.cloudera.com/cdsw/1.0.1/we
The access to Internet is available (As most of the other pods are started)
Any ideas ?
Thanks
Created on 07-06-2017 07:59 AM - edited 07-06-2017 08:00 AM
Hello,
This is weird. Could you try a manual docker pull?
docker.repository.cloudera.com/cdsw/1.0.1/web:052787a
Thanks,
Peter
Created on 07-06-2017 07:59 AM - edited 07-06-2017 08:00 AM
Hello,
This is weird. Could you try a manual docker pull?
docker.repository.cloudera.com/cdsw/1.0.1/web:052787a
Thanks,
Peter
Created 07-06-2017 09:17 AM
Hi Peter,
It actually did the trick
$ sudo docker pull "docker.repository.cloudera.com/cdsw/1.0.1/web:052787a" 052787a: Pulling from cdsw/1.0.1/web b6f892c0043b: Already exists 55010f332b04: Already exists 2955fb827c94: Already exists 3deef3fcbd30: Already exists cf9722e506aa: Already exists 72923da64564: Already exists 3101e33a625d: Already exists c03d5fa4b8e5: Already exists 35c1e4a8663c: Already exists a1b3940356ad: Already exists 62370be47aba: Already exists ddb5566a99f9: Already exists 8b5b82cdf853: Already exists 0c1a28ba377b: Already exists 5911a6a3d3db: Already exists eb2b63f33d61: Already exists 3af8b8e8dc75: Already exists 19d9e7bce45d: Pull complete 396039e72b5e: Pull complete b1fa7de66580: Pull complete c15cd2ff85a4: Pull complete 87916a3ab13a: Pull complete 6c2fbb95a61e: Pull complete 938edf86928e: Pull complete e0889d759edc: Extracting [==================================================>] 526.4 MB/526.4 MB e0889d759edc: Pull complete 319dc7c60d62: Pull complete dd1001380640: Pull complete Digest: sha256:ecb807b8758acdfd1c6b0ff5acb1dad947cded312b47b60012c7478a0fcd9232 Status: Downloaded newer image for docker.repository.cloudera.com/cdsw/1.0.1/web:052787a
Still a problem with 3 pods - will check that later
Cloudera Data Science Workbench Status Service Status docker: active kubelet: active nfs: active Checking kernel parameters... Node Status NAME STATUS AGE STATEFUL workbench Ready 3h true System Pod status NAME READY STATUS RESTARTS AGE IP NODE dummy-2088944543-uev12 1/1 Running 0 3h 10.0.0.5 workbench etcd-workbench 1/1 Running 0 3h 10.0.0.5 workbench kube-apiserver-workbench 1/1 Running 0 3h 10.0.0.5 workbench kube-controller-manager-workbench 1/1 Running 3 3h 10.0.0.5 workbench kube-discovery-1150918428-v7vu8 1/1 Running 0 3h 10.0.0.5 workbench kube-dns-3873593988-vos07 3/3 Running 0 3h 100.66.0.2 workbench kube-proxy-7qq63 1/1 Running 0 3h 10.0.0.5 workbench kube-scheduler-workbench 1/1 Running 3 3h 10.0.0.5 workbench node-problem-detector-v0.1-kngbh 1/1 Running 0 3h 10.0.0.5 workbench weave-net-clu7s 2/2 Running 0 3h 10.0.0.5 workbench Cloudera Data Science Workbench Pod Status NAME READY STATUS RESTARTS AGE IP NODE ROLE cron-2934152315-56p1n 1/1 Running 0 3h 100.66.0.8 workbench cron db-39862959-icvq9 1/1 Running 1 3h 100.66.0.5 workbench db db-migrate-052787a-mvb40 1/1 Running 0 3h 100.66.0.4 workbench db-migrate engine-deps-du8cx 1/1 Running 0 3h 100.66.0.3 workbench engine-deps ingress-controller-3138093376-l5z46 1/1 Running 0 3h 10.0.0.5 workbench ingress-controller livelog-1900214889-qppq2 1/1 Running 0 3h 100.66.0.6 workbench livelog reconciler-459456250-wgems 1/1 Running 0 3h 100.66.0.7 workbench reconciler spark-port-forwarder-a31as 1/1 Running 0 3h 10.0.0.5 workbench spark-port-forwarder web-3826671331-7xchm 0/1 ContainerCreating 0 3h <none> workbench web web-3826671331-h3gkd 0/1 ContainerCreating 0 3h <none> workbench web web-3826671331-vtbdh 0/1 ContainerCreating 0 3h <none> workbench web Cloudera Data Science Workbench is not ready yet: some application pods are not ready
Thanks!
Created 07-07-2017 01:18 AM
Hello,
The 3 last pods are not starting due some issue mounting volumes
Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 15h 33s 224 {kubelet workbench} Warning FailedMount MountVolume.SetUp failed for volume "kubernetes.io/nfs/bee36b58-6247-11e7-9372-000d3a29b7ab-projects-share" (spec.Name: "projects-share") pod "bee36b58-6247-11e7-9372-000d3a29b7ab" (UID: "bee36b58-6247-11e7-9372-000d3a29b7ab") with: mount failed: exit status 32 Mounting arguments: 10.0.0.4:/var/lib/cdsw/current/projects /var/lib/kubelet/pods/bee36b58-6247-11e7-9372-000d3a29b7ab/volumes/kubernetes.io~nfs/projects-share nfs [] Output: mount.nfs: Connection timed out 19h 4s 502 {kubelet workbench} Warning FailedMount Unable to mount volumes for pod "web-3826671331-7xchm_default(bee36b58-6247-11e7-9372-000d3a29b7ab)": timeout expired waiting for volumes to attach/mount for pod "web-3826671331-7xchm"/"default". list of unattached/unmounted volumes=[projects-claim] 19h 4s 502 {kubelet workbench} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "web-3826671331-7xchm"/"default". list of unattached/unmounted volumes=[projects-claim]
Google research did not really help me pointing waht could be the cause.
Any pointers where is should look at ?
Thanks!
Created 07-07-2017 01:40 AM
Found it. I interpreted "MASTER" as being Master node of the CDH cluster 😉
Unsing the right IP did fix the issue
Thanks