Support Questions

ChrisV · ‎07-06-2017

Hello,

New to Cloudera, I'm deploying the CDSW in Azure (on a Cloudera Centos7.2 template).

The installation went ok, the Init started well, but eventually not all the pod will start:

Cloudera Data Science Workbench Status

Service Status
docker: active
kubelet: active
nfs: active
Checking kernel parameters...

Node Status
NAME        STATUS    AGE       STATEFUL
workbench   Ready     2h        true

System Pod status
NAME                                READY     STATUS    RESTARTS   AGE       IP           NODE
dummy-2088944543-uev12              1/1       Running   0          2h        10.0.0.5     workbench
etcd-workbench                      1/1       Running   0          2h        10.0.0.5     workbench
kube-apiserver-workbench            1/1       Running   0          2h        10.0.0.5     workbench
kube-controller-manager-workbench   1/1       Running   2          2h        10.0.0.5     workbench
kube-discovery-1150918428-v7vu8     1/1       Running   0          2h        10.0.0.5     workbench
kube-dns-3873593988-vos07           3/3       Running   0          2h        100.66.0.2   workbench
kube-proxy-7qq63                    1/1       Running   0          2h        10.0.0.5     workbench
kube-scheduler-workbench            1/1       Running   2          2h        10.0.0.5     workbench
node-problem-detector-v0.1-kngbh    1/1       Running   0          2h        10.0.0.5     workbench
weave-net-clu7s                     2/2       Running   0          2h        10.0.0.5     workbench

Cloudera Data Science Workbench Pod Status
NAME                                  READY     STATUS              RESTARTS   AGE       IP           NODE        ROLE
cron-2934152315-56p1n                 1/1       Running             0          2h        100.66.0.8   workbench   cron
db-39862959-icvq9                     1/1       Running             1          2h        100.66.0.5   workbench   db
db-migrate-052787a-mvb40              0/1       ImagePullBackOff    0          2h        100.66.0.4   workbench   db-migrate
engine-deps-du8cx                     1/1       Running             0          2h        100.66.0.3   workbench   engine-deps
ingress-controller-3138093376-l5z46   1/1       Running             0          2h        10.0.0.5     workbench   ingress-controller
livelog-1900214889-qppq2              1/1       Running             0          2h        100.66.0.6   workbench   livelog
reconciler-459456250-wgems            1/1       Running             0          2h        100.66.0.7   workbench   reconciler
spark-port-forwarder-a31as            1/1       Running             0          2h        10.0.0.5     workbench   spark-port-forwarder
web-3826671331-7xchm                  0/1       ContainerCreating   0          2h        <none>       workbench   web
web-3826671331-h3gkd                  0/1       ContainerCreating   0          2h        <none>       workbench   web
web-3826671331-vtbdh                  0/1       ContainerCreating   0          2h        <none>       workbench   web

Cloudera Data Science Workbench is not ready yet: some application pods are not ready

$sudo journalctl -u docker

Jul 06 13:42:03 workbench docker[6669]: time="2017-07-06T13:42:03.996814534Z" level=error msg="Handler for GET /images/docker.repository.cloudera.com/cdsw/1.0.1/web:052787a/json returned error: No such image: docker.repository.cloudera.com/cdsw/1.0.1/we

The access to Internet is available (As most of the other pods are started)

Any ideas ?

Thanks

peter_ableda · ‎07-06-2017

Hello,

This is weird. Could you try a manual docker pull?

docker.repository.cloudera.com/cdsw/1.0.1/web:052787a

Thanks,

Peter

View solution in original post

peter_ableda · ‎07-06-2017

Hello,

This is weird. Could you try a manual docker pull?

docker.repository.cloudera.com/cdsw/1.0.1/web:052787a

Thanks,

Peter

ChrisV · ‎07-06-2017

Hi Peter,

It actually did the trick

$ sudo docker pull "docker.repository.cloudera.com/cdsw/1.0.1/web:052787a"
052787a: Pulling from cdsw/1.0.1/web
b6f892c0043b: Already exists
55010f332b04: Already exists
2955fb827c94: Already exists
3deef3fcbd30: Already exists
cf9722e506aa: Already exists
72923da64564: Already exists
3101e33a625d: Already exists
c03d5fa4b8e5: Already exists
35c1e4a8663c: Already exists
a1b3940356ad: Already exists
62370be47aba: Already exists
ddb5566a99f9: Already exists
8b5b82cdf853: Already exists
0c1a28ba377b: Already exists
5911a6a3d3db: Already exists
eb2b63f33d61: Already exists
3af8b8e8dc75: Already exists
19d9e7bce45d: Pull complete
396039e72b5e: Pull complete
b1fa7de66580: Pull complete
c15cd2ff85a4: Pull complete
87916a3ab13a: Pull complete
6c2fbb95a61e: Pull complete
938edf86928e: Pull complete
e0889d759edc: Extracting [==================================================>] 526.4 MB/526.4 MB
e0889d759edc: Pull complete
319dc7c60d62: Pull complete
dd1001380640: Pull complete
Digest: sha256:ecb807b8758acdfd1c6b0ff5acb1dad947cded312b47b60012c7478a0fcd9232
Status: Downloaded newer image for docker.repository.cloudera.com/cdsw/1.0.1/web:052787a

Still a problem with 3 pods - will check that later

Cloudera Data Science Workbench Status

Service Status
docker: active
kubelet: active
nfs: active
Checking kernel parameters...

Node Status
NAME        STATUS    AGE       STATEFUL
workbench   Ready     3h        true

System Pod status
NAME                                READY     STATUS    RESTARTS   AGE       IP           NODE
dummy-2088944543-uev12              1/1       Running   0          3h        10.0.0.5     workbench
etcd-workbench                      1/1       Running   0          3h        10.0.0.5     workbench
kube-apiserver-workbench            1/1       Running   0          3h        10.0.0.5     workbench
kube-controller-manager-workbench   1/1       Running   3          3h        10.0.0.5     workbench
kube-discovery-1150918428-v7vu8     1/1       Running   0          3h        10.0.0.5     workbench
kube-dns-3873593988-vos07           3/3       Running   0          3h        100.66.0.2   workbench
kube-proxy-7qq63                    1/1       Running   0          3h        10.0.0.5     workbench
kube-scheduler-workbench            1/1       Running   3          3h        10.0.0.5     workbench
node-problem-detector-v0.1-kngbh    1/1       Running   0          3h        10.0.0.5     workbench
weave-net-clu7s                     2/2       Running   0          3h        10.0.0.5     workbench

Cloudera Data Science Workbench Pod Status
NAME                                  READY     STATUS              RESTARTS   AGE       IP           NODE        ROLE
cron-2934152315-56p1n                 1/1       Running             0          3h        100.66.0.8   workbench   cron
db-39862959-icvq9                     1/1       Running             1          3h        100.66.0.5   workbench   db
db-migrate-052787a-mvb40              1/1       Running             0          3h        100.66.0.4   workbench   db-migrate
engine-deps-du8cx                     1/1       Running             0          3h        100.66.0.3   workbench   engine-deps
ingress-controller-3138093376-l5z46   1/1       Running             0          3h        10.0.0.5     workbench   ingress-controller
livelog-1900214889-qppq2              1/1       Running             0          3h        100.66.0.6   workbench   livelog
reconciler-459456250-wgems            1/1       Running             0          3h        100.66.0.7   workbench   reconciler
spark-port-forwarder-a31as            1/1       Running             0          3h        10.0.0.5     workbench   spark-port-forwarder
web-3826671331-7xchm                  0/1       ContainerCreating   0          3h        <none>       workbench   web
web-3826671331-h3gkd                  0/1       ContainerCreating   0          3h        <none>       workbench   web
web-3826671331-vtbdh                  0/1       ContainerCreating   0          3h        <none>       workbench   web

Cloudera Data Science Workbench is not ready yet: some application pods are not ready

Thanks!

ChrisV · ‎07-07-2017

Hello,

The 3 last pods are not starting due some issue mounting volumes

Events:
  FirstSeen     LastSeen        Count   From                    SubobjectPath   Type            Reason          Message
  ---------     --------        -----   ----                    -------------   --------        ------          -------
  15h           33s             224     {kubelet workbench}                     Warning         FailedMount     MountVolume.SetUp failed for volume "kubernetes.io/nfs/bee36b58-6247-11e7-9372-000d3a29b7ab-projects-share" (spec.Name: "projects-share") pod "bee36b58-6247-11e7-9372-000d3a29b7ab" (UID: "bee36b58-6247-11e7-9372-000d3a29b7ab") with: mount failed: exit status 32
Mounting arguments: 10.0.0.4:/var/lib/cdsw/current/projects /var/lib/kubelet/pods/bee36b58-6247-11e7-9372-000d3a29b7ab/volumes/kubernetes.io~nfs/projects-share nfs []
Output: mount.nfs: Connection timed out


  19h   4s      502     {kubelet workbench}             Warning FailedMount     Unable to mount volumes for pod "web-3826671331-7xchm_default(bee36b58-6247-11e7-9372-000d3a29b7ab)": timeout expired waiting for volumes to attach/mount for pod "web-3826671331-7xchm"/"default". list of unattached/unmounted volumes=[projects-claim]
  19h   4s      502     {kubelet workbench}             Warning FailedSync      Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "web-3826671331-7xchm"/"default". list of unattached/unmounted volumes=[projects-claim]

Google research did not really help me pointing waht could be the cause.

Any pointers where is should look at ?

Thanks!

ChrisV · ‎07-07-2017

Found it. I interpreted "MASTER" as being Master node of the CDH cluster 😉

Unsing the right IP did fix the issue

Thanks

Cloudera Community

Support Questions

Data Science Workbench - "no such image: docker.repository.cloudera.com/cdsw/1.0.1/web"