Created on 11-29-2018 01:19 PM - edited 09-16-2022 06:56 AM
I am attempting to get CDSW up and running on our cluster. I recently upgraded the service by following the steps outlined here:
https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_install.html
But when restarting CDSW, several of the kubernentes pods are failing. The "cdsw status" command reveals these are the pods that are failing:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | NAME | READY | STATUS | RESTARTS | CREATED-AT | POD-IP | HOST-IP | ROLE | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | cron-744b9cd84-8htcf | 1/1 | Running | 0 | 2018-11-29 21:00:52+00:00 | 100.66.0.3 | 10.142.0.7 | cron | | db-74df8c56d9-cvdkj | 1/1 | Running | 0 | 2018-11-29 21:00:52+00:00 | 100.66.0.7 | 10.142.0.7 | db | | db-migrate-286f701-6ncf4 | 0/1 | Failed | 0 | 2018-11-29 21:00:52+00:00 | 100.66.0.6 | 10.142.0.7 | db-migrate | | db-migrate-286f701-d5plv | 0/1 | Failed | 0 | 2018-11-29 21:02:16+00:00 | 100.66.0.6 | 10.142.0.7 | db-migrate | | db-migrate-286f701-fk6jf | 0/1 | Failed | 0 | 2018-11-29 21:06:10+00:00 | 100.66.0.6 | 10.142.0.7 | db-migrate | | db-migrate-286f701-h5lz4 | 0/1 | Failed | 0 | 2018-11-29 21:03:05+00:00 | 100.66.0.6 | 10.142.0.7 | db-migrate | | db-migrate-286f701-hgz7x | 0/1 | Failed | 0 | 2018-11-29 21:04:50+00:00 | 100.66.0.6 | 10.142.0.7 | db-migrate | | db-migrate-286f701-xcdp2 | 0/1 | Failed | 0 | 2018-11-29 21:04:10+00:00 | 100.66.0.6 | 10.142.0.7 | db-migrate | | ds-cdh-client-74b579758f-47msc | 1/1 | Running | 0 | 2018-11-29 21:00:55+00:00 | 100.66.0.16 | 10.142.0.7 | ds-cdh-client | | ds-operator-56f9769b59-sj4vp | 1/2 | CrashLoopBackOff | 5 | 2018-11-29 21:00:55+00:00 | 100.66.0.21 | 10.142.0.7 | ds-operator | | ds-vfs-57c7544b87-mdx4w | 1/1 | Running | 0 | 2018-11-29 21:00:55+00:00 | 100.66.0.24 | 10.142.0.7 | ds-vfs | | ingress-controller-698578dd5f-lzcb2 | 1/1 | Running | 0 | 2018-11-29 21:00:52+00:00 | 10.142.0.7 | 10.142.0.7 | ingress-controller | | livelog-66d657b4bd-8lpzm | 1/1 | Running | 0 | 2018-11-29 21:00:52+00:00 | 100.66.0.10 | 10.142.0.7 | livelog | | s2i-builder-75658678dd-jdf57 | 1/1 | Running | 0 | 2018-11-29 21:00:55+00:00 | 100.66.0.12 | 10.142.0.7 | s2i-builder | | s2i-builder-75658678dd-l5cck | 1/1 | Running | 0 | 2018-11-29 21:00:55+00:00 | 100.66.0.15 | 10.142.0.7 | s2i-builder | | s2i-builder-75658678dd-t7m5q | 1/1 | Running | 0 | 2018-11-29 21:00:55+00:00 | 100.66.0.14 | 10.142.0.7 | s2i-builder | | s2i-client-bfc8dd49b-4s9q6 | 1/1 | Running | 0 | 2018-11-29 21:00:55+00:00 | 100.66.0.22 | 10.142.0.7 | s2i-client | | s2i-git-server-85768bf7d4-zz5zx | 1/1 | Running | 0 | 2018-11-29 21:00:52+00:00 | 100.66.0.8 | 10.142.0.7 | s2i-git-server | | s2i-queue-5968f8d774-z8h5r | 1/1 | Running | 0 | 2018-11-29 21:00:52+00:00 | 100.66.0.9 | 10.142.0.7 | s2i-queue | | s2i-registry-7586cb8b89-wrsjh | 1/1 | Running | 0 | 2018-11-29 21:00:52+00:00 | 100.66.0.19 | 10.142.0.7 | s2i-registry | | s2i-registry-auth-ff494b5d7-vd2qn | 1/1 | Running | 0 | 2018-11-29 21:00:52+00:00 | 100.66.0.18 | 10.142.0.7 | s2i-registry-auth | | s2i-server-845b559b88-mnzsz | 1/1 | Running | 0 | 2018-11-29 21:00:53+00:00 | 100.66.0.13 | 10.142.0.7 | s2i-server | | secret-generator-6ff758b4df-584dr | 1/1 | Running | 0 | 2018-11-29 21:00:53+00:00 | 100.66.0.11 | 10.142.0.7 | secret-generator | | spark-port-forwarder-d2bzh | 1/1 | Running | 0 | 2018-11-29 21:00:55+00:00 | 10.142.0.7 | 10.142.0.7 | spark-port-forwarder | | web-99696d576-5c4cn | 0/1 | Running | 1 | 2018-11-29 21:00:53+00:00 | 100.66.0.20 | 10.142.0.7 | web | | web-99696d576-8sghg | 0/1 | Running | 1 | 2018-11-29 21:00:53+00:00 | 100.66.0.17 | 10.142.0.7 | web | | web-99696d576-w6q5d | 0/1 | Running | 1 | 2018-11-29 21:00:53+00:00 | 100.66.0.23 | 10.142.0.7 | web | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
As well, when running CDSW validate, I am getting the following error:
Validating pre-init steps... JAVA_HOME is not set ERROR:: CDSW Host validation failed: 1
However, on that host, the JAVA_HOME is appropriately set.
[root@edge-8b9e7b9c-f5b5-4dbd-9f6d-287690b4502c ~]# echo $JAVA_HOME /usr/java/jdk1.8.0_151
As well, JAVA_HOME is appropriately set in CDSW to the same JDK. Any assistance in troubleshooting why we are encountering these issues would be hugely appreciated.
Created 12-08-2018 10:00 PM
Hello @mtrepanier
Thanks for posting your query
Do you see any suspicious messages in "/var/log/cdsw/cdsw_health.log" If so can you copy it here.
Created 02-01-2019 10:03 AM
@satz thanks for the reply (and sorry for the delayed response).
There ended up being an issue with our CDH install as a whole, likely stemming from an upgrade from 5.12 to 5.14. After several hours working with a rep, tearing down and rebuilding the cluster was the only solution. We provided the logs etc. under that ticket.