Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Worker node is not responding properly

avatar
Contributor

I have sucessfully added my worker node to master using "cdsw join". Some of the nodes are working fine, some of the node not responding properly. When i typed "cdsw status" i'm getting "Cloudera Data Science Workbench is not ready yet: cannot curl localhost"

 

Thanks

Krishna

1 ACCEPTED SOLUTION

avatar
Contributor

I fixed the issue "Cloudera Data Science Workbench is not ready yet: cannot curl localhost" by starting the httpd service. Now every worker node is showing "Cloudera Data Science Workbench is ready!"

View solution in original post

8 REPLIES 8

avatar
Super Collaborator

Hi Krishna,

 

We have a known issue with running the 'cdsw status' command on a worker host. We are planning to fix this for the next release. In the meantime, please submit the 'cdsw status' command on your master host to see the status of your workbench.

 

Could you share your status command output from the master host with us, as it might give us some context what might be the issue? 

Are you sure you followed the install steps consistently on the worker hosts, e.g. submitted the 'cdsw enable <IPv4_address_of_worker>' on all worker?

 

Thanks,

Scotty

avatar
Contributor

On Master node "cdsw status" is sowing "Cloudera Data Science Workbench is ready!". Below is the output of "cdsw status" command

 

 

Cloudera Data Science Workbench Status

Service Status
docker: active
kubelet: active
nfs: active
Checking kernel parameters...

Node Status
NAME                             STATUS    AGE       STATEFUL
abcd.abc.com   Ready     30d       true
xxxx.corp.abc   Ready     5d        <none>
xxxx.corp.abc   Ready     1d        <none>
xxxx.corp.abc   Ready     1d        <none>
xxxx.corp.abc   Ready     1d        <none>

System Pod status
NAME                                                     READY     STATUS              RESTARTS   AGE
dummy-2088944543-iu5em                                   1/1       Running             5          30d
etcd-abcd.corp.abc                      1/1       Running             5          30d
kube-apiserver-abcd.corp.abc            1/1       Running             6          30d
kube-controller-manager-abcd.corp.abc   1/1       Running             5          30d
kube-discovery-1150918428-s7m6e                          0/1       MatchNodeSelector   0          30d
kube-discovery-1150918428-tfch1                          1/1       Running             3          7d
kube-dns-3873593988-g83ve                                3/3       Running             15         30d
kube-proxy-2rfvu                                         1/1       Running             0          1d
kube-proxy-mq6z1                                         1/1       Running             0          5d
kube-proxy-orp04                                         1/1       Running             0          1d
kube-proxy-pd3kl                                         1/1       Running             0          1d
kube-proxy-wlaqj                                         1/1       Running             5          30d
kube-scheduler-abcd.corp.abc            1/1       Running             5          30d
node-problem-detector-v0.1-bte1v                         1/1       Running             0          1d
node-problem-detector-v0.1-cvwav                         1/1       Running             0          5d
node-problem-detector-v0.1-extu9                         1/1       Running             0          1d
node-problem-detector-v0.1-qlz7s                         1/1       Running             5          30d
node-problem-detector-v0.1-vftvo                         1/1       Running             0          1d
weave-net-38alm                                          2/2       Running             11         30d
weave-net-4mg1p                                          2/2       Running             0          1d
weave-net-e99uh                                          2/2       Running             0          1d
weave-net-eyern                                          2/2       Running             1          5d
weave-net-i428d                                          2/2       Running             0          1d

Cloudera Data Science Workbench Pod Status
NAME                                  READY     STATUS              RESTARTS   AGE       ROLE
cron-3971587342-670nl                 1/1       Running             5          30d       cron
db-4066525870-0xmz5                   1/1       Running             3          7d        db
db-4066525870-g493s                   0/1       MatchNodeSelector   0          30d       db
db-migrate-abec968-2pbnb              0/1       Completed           0          30d       db-migrate
engine-deps-gz18i                     1/1       Running             5          30d       engine-deps
engine-deps-jeu62                     1/1       Running             0          1d        engine-deps
engine-deps-pop4r                     1/1       Running             0          1d        engine-deps
engine-deps-t7a9m                     1/1       Running             0          5d        engine-deps
engine-deps-v16a5                     1/1       Running             0          1d        engine-deps
ingress-controller-2976678207-lrdp8   0/1       MatchNodeSelector   0          30d       ingress-controller
ingress-controller-2976678207-qrz8x   1/1       Running             3          7d        ingress-controller
livelog-2494298876-22gbi              1/1       Running             3          7d        livelog
livelog-2494298876-rhtg5              0/1       MatchNodeSelector   0          30d       livelog
reconciler-577027981-r4vni            1/1       Running             5          30d       reconciler
spark-port-forwarder-cmcso            1/1       Running             0          1d        spark-port-forwarder
spark-port-forwarder-e6a30            1/1       Running             0          1d        spark-port-forwarder
spark-port-forwarder-o50lr            1/1       Running             0          1d        spark-port-forwarder
spark-port-forwarder-salu1            1/1       Running             0          5d        spark-port-forwarder
spark-port-forwarder-tjhaa            1/1       Running             5          30d       spark-port-forwarder
web-1304125449-5qb5e                  1/1       Running             5          30d       web
web-1304125449-na1av                  1/1       Running             5          30d       web
web-1304125449-qhs08                  1/1       Running             5          30d       web

Cloudera Data Science Workbench is ready!

Note: abcd.abc.com is the master

 

avatar
Contributor

@peter_abledaany update..

avatar
Super Collaborator

Hi Krishna,

 

One issue we see is the master node is missing the stateful true flag. Something might went wrong with your installation.
The code which would put the stateful tag on the node is using the `hostname` command output. Could you check if the `hostname` output matches the node name you see in the `cdsw status` output?
I would start by doing `cdsw stop` and `cdsw start` on the master node and see if the stateful flag gets there. If this is not working you will probably need to do a `cdsw reset` and `cdsw init` for the master and workers also.

 

Regards,

Peter

avatar
Super Collaborator

Hi Krishna,

 

There is the true flag under the Stateful column, just it's not rendered nicely, please disregard the previous post. Could you try to restart your cdsw application and upload the `cdsw status` command output again? We shouldn't see multiple database pods there.

 

Thanks,

Peter

avatar
Contributor
Hi Peter,

I started building new cluster, while doing that, i got below issue.

http://community.cloudera.com/t5/Cloudera-Data-Science-Workbench/cdsw-init-failed/m-p/56646#M84

avatar
Contributor

I fixed the issue "Cloudera Data Science Workbench is not ready yet: cannot curl localhost" by starting the httpd service. Now every worker node is showing "Cloudera Data Science Workbench is ready!"

avatar
New Contributor

Can you please provide more details of what service you have restarted??

 

Thanks,

MK