Created 06-29-2017 10:45 PM
I have installed work bench on 5 node cluster. Everything looks good. But when i checked cdsw status its showing stateful as <none>
Node Status NAME STATUS AGE STATEFUL hostname1 Ready 1h true hostname2 Ready 1h <none> hostname3 Ready 1h <none> hostname4 Ready 1h <none> hostname5 Ready 1h <none>
And when i launched workbench from cdsw.company.com, its forever showing " ContainerCreating: Creating engine container." and input field is blinking in red color
Created 07-06-2017 10:15 AM
The <none> indicator is not an issue -- it simply indicates that those nodes are worker nodes and don't have stateful information stored on them.
Hanging engines on "ContainerCreating" typically means you have not run "cdsw enable <worker-ip>" on the master node for all your worker nodes. This whitelists the IP of your worker nodes for NFS mounts. If you have not done this, containers can hang waiting for the project mounts to become available when scheduled onto a worker node.
Please let me know if running "cdsw enable" for each worker IP resolves this issue.
Thanks,
Tristan
Created 07-07-2017 01:29 AM
hi
i am also expiriencing the same error.
my bench has 2 workers and master. the workers are enable and with "cdsw enable" i can see that they are running.
Regards
Nes
Created on 07-10-2017 03:43 AM - edited 07-10-2017 03:47 AM
i have tried below,
Test 1:
cdsw enable "worker_node_ip"
Result: Same issue
Test 2: Removed nodes from the cluster, added again
Result: Same issue
Test 3: Reset Master and workers, performed "cdsw init" and "cdsw enable worker_ip" on master and "cdsw join"
Result: Same issue
Still getting " ContainerCreating: Creating engine container."
Admin --> Site Administration--> Overview
Created 07-10-2017 11:00 AM
Could you please give the output for:
kubectl get events
kubectl logs <stuck-pod-id> engine
Tristan
Created 07-10-2017 09:37 PM
Hi Tristan,
"kubectl get events" didn't gave any ouput.
Below is the "cdsw status" output
[root@hostname ~]# cdsw status Cloudera Data Science Workbench Status Service Status docker: active kubelet: active nfs: active Checking kernel parameters... Node Status NAME STATUS AGE STATEFUL master Ready 17h true worker1 Ready 17h <none> worker2 Ready 17h <none> worker3 Ready 17h <none> worker3 Ready 17h <none> System Pod status NAME READY STATUS RESTARTS AGE IP NODE dummy-2088944543-c4pbg 1/1 Running 0 17h 10.x.x.x master etcd-master 1/1 Running 0 17h 10.x.x.x master kube-apiserver-master 1/1 Running 0 17h 10.x.x.x master kube-controller-manager-master 1/1 Running 0 17h 10.x.x.x master kube-discovery-1150918428-se35m 1/1 Running 0 17h 10.x.x.x master kube-dns-3873593988-olmcy 3/3 Running 0 17h 100.66.0.2 master kube-proxy-cr019 1/1 Running 0 17h 10.x.x.x master kube-proxy-o316l 1/1 Running 0 17h 10.x.x.x worker3 kube-proxy-txbph 1/1 Running 0 17h 10.x.x.x worker2 kube-proxy-u0riv 1/1 Running 0 17h 10.x.x.x worker3 kube-proxy-xf6ta 1/1 Running 0 17h 10.x.x.x worker1 kube-scheduler-master 1/1 Running 0 17h 10.x.x.x master node-problem-detector-v0.1-7zp8i 1/1 Running 0 17h 10.x.x.x worker1 node-problem-detector-v0.1-be2cf 1/1 Running 0 17h 10.x.x.x worker3 node-problem-detector-v0.1-ej7yx 1/1 Running 0 17h 10.x.x.x worker2 node-problem-detector-v0.1-maik6 1/1 Running 0 17h 10.x.x.x master node-problem-detector-v0.1-xf9o0 1/1 Running 0 17h 10.x.x.x worker3 weave-net-31402 2/2 Running 0 17h 10.x.x.x worker1 weave-net-71t9s 2/2 Running 0 17h 10.x.x.x worker3 weave-net-8p26z 2/2 Running 0 17h 10.x.x.x worker3 weave-net-m4e8x 2/2 Running 0 17h 10.x.x.x worker2 weave-net-wfd35 2/2 Running 0 17h 10.x.x.x master Cloudera Data Science Workbench Pod Status NAME READY STATUS RESTARTS AGE IP NODE ROLE cron-2934152315-ymbxu 1/1 Running 0 17h 100.66.0.8 master cron db-39862959-s2ic8 1/1 Running 0 17h 100.66.0.4 master db db-migrate-052787a-170ff 0/1 Completed 0 17h 100.66.0.5 master db-migrate engine-deps-3uqcr 1/1 Running 0 17h 100.66.0.3 master engine-deps engine-deps-6npbb 1/1 Running 0 17h 100.66.192.1 worker1 engine-deps engine-deps-m2385 1/1 Running 0 17h 100.66.64.1 worker2 engine-deps engine-deps-qgcwy 1/1 Running 0 17h 100.66.128.1 worker3 engine-deps engine-deps-zblkz 1/1 Running 0 17h 100.66.160.1 worker3 engine-deps ingress-controller-3138093376-nx1wi 1/1 Running 0 17h 10.x.x.x master ingress-controller livelog-1900214889-bqhf7 1/1 Running 0 17h 100.66.0.7 master livelog reconciler-459456250-ma02c 1/1 Running 0 17h 100.66.0.6 master reconciler spark-port-forwarder-0yxno 1/1 Running 0 17h 10.x.x.x worker2 spark-port-forwarder spark-port-forwarder-86dv2 1/1 Running 0 17h 10.x.x.x worker3 spark-port-forwarder spark-port-forwarder-l2u4k 1/1 Running 0 17h 10.x.x.x worker1 spark-port-forwarder spark-port-forwarder-lpwms 1/1 Running 0 17h 10.x.x.x master spark-port-forwarder spark-port-forwarder-rsx25 1/1 Running 0 17h 10.x.x.x worker3 spark-port-forwarder web-3826671331-0n92g 1/1 Running 0 17h 100.66.0.10 master web web-3826671331-my2vs 1/1 Running 0 17h 100.66.0.9 master web web-3826671331-zva8n 1/1 Running 0 17h 100.66.0.5 master web Cloudera Data Science Workbench is ready!
"kubectl logs <stuck-pod-id> engine" no pod in stuck mod, its stuck while launching te container in WebUI, Kindly check below screenshot.
Thanks
Krishna
Created 07-11-2017 01:01 AM
Hi Krishna,
When you start a new session in the workbench you should see a new pod in the list:
gy17uw2d8p5gpuh1 3/3 Running 0 14m 100.66.0.18 hostname console
We would like to see the logs for this pod.
Regards,
Peter
Created 07-11-2017 02:12 AM
Below is the output of "kubectl logs pod_id engine"
[root@hostname ~]# kubectl logs ubgujdi5b9b6mmwr engine 2017-07-11 09:06:10.097 9 INFO Engine Waiting one second for Spark config... data = {"id":"ubgujdi5b9b6mmwr"} 2017-07-11 09:06:11.186 15 INFO Engine Waiting one second for Spark config... data = {"id":"ubgujdi5b9b6mmwr"} /var/lib/cdsw/config/startup.sh: line 31: undefined: command not found 2017/07/11 09:06:12 Loading config file at: /var/lib/cdsw/deps/terminal-conf 2017/07/11 09:06:12 Permitting clients to write input to the PTY. 2017/07/11 09:06:12 Server is starting with command: /bin/bash 2017/07/11 09:06:12 URL: http://0.0.0.0:8000/xmfwh64etire9k0l/ 2017/07/11 09:06:13 100.66.0.1:51408 301 GET /xmfwh64etire9k0l 2017-07-11 09:06:13.636 7 INFO Engine ubgujdi5b9b6mmwr Start Authenticating to livelog data = {"secondsSinceStartup":0.85} Livelog Open 2017-07-11 09:06:13.679 7 INFO Engine ubgujdi5b9b6mmwr Finish Authenticating to livelog: success data = {"secondsSinceStartup":0.898} 2017-07-11 09:06:13.680 7 INFO Engine ubgujdi5b9b6mmwr Start Searching for engine module data = {"secondsSinceStartup":0.9} 2017-07-11 09:06:14.410 7 INFO Engine ubgujdi5b9b6mmwr Finish Searching for engine module: success data = {"engineModule_path":"/usr/local/lib/node_modules/python2-engine"} 2017-07-11 09:06:14.410 7 INFO Engine ubgujdi5b9b6mmwr Start Creating engine data = {"secondsSinceStartup":1.63} PID of parser IPython process is 59 PID of main IPython process is 62 2017-07-11 09:06:14.661 7 INFO Engine ubgujdi5b9b6mmwr Finish Creating engine data = {"secondsSinceStartup":1.88} 2017-07-11 09:06:22.492 7 INFO Engine ubgujdi5b9b6mmwr Start Registering running status data = {"useHttps":false,"host":"100.77.0.130","path":"/api/v1/projects/Krishna/test1/dashboards/ubgujdi5b9b6mmwr/register-status","senseDomain":"cdsw.adobe.com"} 2017-07-11 09:06:22.505 7 INFO Engine ubgujdi5b9b6mmwr Finish Registering running status: success 2017-07-11 09:06:22.506 7 INFO Engine ubgujdi5b9b6mmwr Pod is ready data = {"secondsSinceStartup":9.726,"engineModuleShare":8.096}
Created 07-16-2017 09:14 PM
Any update @tristanzajonc @peter.ableda
Created 07-24-2017 05:28 AM
open your development tools of your browser and go to console, i guess you will find an error there.
i had the same problem and found an error there that pointed me to a wildcard dns problem.
Created 12-14-2017 09:38 AM
@tristanzajonc wrote:The <none> indicator is not an issue -- it simply indicates that those nodes are worker nodes and don't have stateful information stored on them.
Hanging engines on "ContainerCreating" typically means you have not run "cdsw enable <worker-ip>" on the master node for all your worker nodes. This whitelists the IP of your worker nodes for NFS mounts. If you have not done this, containers can hang waiting for the project mounts to become available when scheduled onto a worker node.
Please let me know if running "cdsw enable" for each worker IP resolves this issue.
Thanks,
Tristan
FYI, i was having the same issue. and this resolved it for me.
Thanks