Support Questions

Find answers, ask questions, and share your expertise

DB-Migrate Kubernetes Pods Failing to Start

Explorer

I am attempting to get CDSW up and running on our cluster. I recently upgraded the service by following the steps outlined here:

 

https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_install.html

 

But when restarting CDSW, several of the kubernentes pods are failing. The "cdsw status" command reveals these are the pods that are failing:

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                   NAME                  |   READY   |        STATUS        |   RESTARTS   |           CREATED-AT          |      POD-IP     |    HOST-IP     |           ROLE           |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|           cron-744b9cd84-8htcf          |    1/1    |       Running        |      0       |   2018-11-29 21:00:52+00:00   |    100.66.0.3   |   10.142.0.7   |           cron           |
|           db-74df8c56d9-cvdkj           |    1/1    |       Running        |      0       |   2018-11-29 21:00:52+00:00   |    100.66.0.7   |   10.142.0.7   |            db            |
|         db-migrate-286f701-6ncf4        |    0/1    |        Failed        |      0       |   2018-11-29 21:00:52+00:00   |    100.66.0.6   |   10.142.0.7   |        db-migrate        |
|         db-migrate-286f701-d5plv        |    0/1    |        Failed        |      0       |   2018-11-29 21:02:16+00:00   |    100.66.0.6   |   10.142.0.7   |        db-migrate        |
|         db-migrate-286f701-fk6jf        |    0/1    |        Failed        |      0       |   2018-11-29 21:06:10+00:00   |    100.66.0.6   |   10.142.0.7   |        db-migrate        |
|         db-migrate-286f701-h5lz4        |    0/1    |        Failed        |      0       |   2018-11-29 21:03:05+00:00   |    100.66.0.6   |   10.142.0.7   |        db-migrate        |
|         db-migrate-286f701-hgz7x        |    0/1    |        Failed        |      0       |   2018-11-29 21:04:50+00:00   |    100.66.0.6   |   10.142.0.7   |        db-migrate        |
|         db-migrate-286f701-xcdp2        |    0/1    |        Failed        |      0       |   2018-11-29 21:04:10+00:00   |    100.66.0.6   |   10.142.0.7   |        db-migrate        |
|      ds-cdh-client-74b579758f-47msc     |    1/1    |       Running        |      0       |   2018-11-29 21:00:55+00:00   |   100.66.0.16   |   10.142.0.7   |      ds-cdh-client       |
|       ds-operator-56f9769b59-sj4vp      |    1/2    |   CrashLoopBackOff   |      5       |   2018-11-29 21:00:55+00:00   |   100.66.0.21   |   10.142.0.7   |       ds-operator        |
|         ds-vfs-57c7544b87-mdx4w         |    1/1    |       Running        |      0       |   2018-11-29 21:00:55+00:00   |   100.66.0.24   |   10.142.0.7   |          ds-vfs          |
|   ingress-controller-698578dd5f-lzcb2   |    1/1    |       Running        |      0       |   2018-11-29 21:00:52+00:00   |    10.142.0.7   |   10.142.0.7   |    ingress-controller    |
|         livelog-66d657b4bd-8lpzm        |    1/1    |       Running        |      0       |   2018-11-29 21:00:52+00:00   |   100.66.0.10   |   10.142.0.7   |         livelog          |
|       s2i-builder-75658678dd-jdf57      |    1/1    |       Running        |      0       |   2018-11-29 21:00:55+00:00   |   100.66.0.12   |   10.142.0.7   |       s2i-builder        |
|       s2i-builder-75658678dd-l5cck      |    1/1    |       Running        |      0       |   2018-11-29 21:00:55+00:00   |   100.66.0.15   |   10.142.0.7   |       s2i-builder        |
|       s2i-builder-75658678dd-t7m5q      |    1/1    |       Running        |      0       |   2018-11-29 21:00:55+00:00   |   100.66.0.14   |   10.142.0.7   |       s2i-builder        |
|        s2i-client-bfc8dd49b-4s9q6       |    1/1    |       Running        |      0       |   2018-11-29 21:00:55+00:00   |   100.66.0.22   |   10.142.0.7   |        s2i-client        |
|     s2i-git-server-85768bf7d4-zz5zx     |    1/1    |       Running        |      0       |   2018-11-29 21:00:52+00:00   |    100.66.0.8   |   10.142.0.7   |      s2i-git-server      |
|        s2i-queue-5968f8d774-z8h5r       |    1/1    |       Running        |      0       |   2018-11-29 21:00:52+00:00   |    100.66.0.9   |   10.142.0.7   |        s2i-queue         |
|      s2i-registry-7586cb8b89-wrsjh      |    1/1    |       Running        |      0       |   2018-11-29 21:00:52+00:00   |   100.66.0.19   |   10.142.0.7   |       s2i-registry       |
|    s2i-registry-auth-ff494b5d7-vd2qn    |    1/1    |       Running        |      0       |   2018-11-29 21:00:52+00:00   |   100.66.0.18   |   10.142.0.7   |    s2i-registry-auth     |
|       s2i-server-845b559b88-mnzsz       |    1/1    |       Running        |      0       |   2018-11-29 21:00:53+00:00   |   100.66.0.13   |   10.142.0.7   |        s2i-server        |
|    secret-generator-6ff758b4df-584dr    |    1/1    |       Running        |      0       |   2018-11-29 21:00:53+00:00   |   100.66.0.11   |   10.142.0.7   |     secret-generator     |
|        spark-port-forwarder-d2bzh       |    1/1    |       Running        |      0       |   2018-11-29 21:00:55+00:00   |    10.142.0.7   |   10.142.0.7   |   spark-port-forwarder   |
|           web-99696d576-5c4cn           |    0/1    |       Running        |      1       |   2018-11-29 21:00:53+00:00   |   100.66.0.20   |   10.142.0.7   |           web            |
|           web-99696d576-8sghg           |    0/1    |       Running        |      1       |   2018-11-29 21:00:53+00:00   |   100.66.0.17   |   10.142.0.7   |           web            |
|           web-99696d576-w6q5d           |    0/1    |       Running        |      1       |   2018-11-29 21:00:53+00:00   |   100.66.0.23   |   10.142.0.7   |           web            |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

As well, when running CDSW validate, I am getting the following error:

 

Validating pre-init steps...
JAVA_HOME is not set

ERROR:: CDSW Host validation failed: 1

 

However, on that host, the JAVA_HOME is appropriately set.

 

[root@edge-8b9e7b9c-f5b5-4dbd-9f6d-287690b4502c ~]# echo $JAVA_HOME
/usr/java/jdk1.8.0_151

As well, JAVA_HOME is appropriately set in CDSW to the same JDK. Any assistance in troubleshooting why we are encountering these issues would be hugely appreciated.

 

 

2 REPLIES 2

Expert Contributor

Hello @mtrepanier

 

Thanks for posting your query

 

Do you see any suspicious messages in "/var/log/cdsw/cdsw_health.log" If so can you copy it here.

 

Thanks,
Satz

Explorer

@satz thanks for the reply (and sorry for the delayed response).

 

There ended up being an issue with our CDH install as a whole, likely stemming from an upgrade from 5.12 to 5.14. After several hours working with a rep, tearing down and rebuilding the cluster was the only solution. We provided the logs etc. under that ticket.