Support Questions

Find answers, ask questions, and share your expertise

cdsw containers crashing

avatar
New Contributor

I have installed cdsw trail version before purchasing the product but stuck with errors.

500 gig block device  and using default /var/lib/ for application. 

CDSW disks provisioned via ESX/VMware. 

CDSW starts fine in CM but failed to launch containers. Here is the status. 

 

Failed events are below for web-5dddb59597-pclf2  (redacted host name for security reasons).

Not sure why tcp-ingress-controller-7d65df6d8f-qqxzn  is starting , i am not using TLS for CDSW

Any help is highly appreciated. 

 

NAME READY STATUS RESTARTS AGE
cdsw-compute-pod-evaluator-566f7664df-rqm2j 1/1 Running 0 9m38s
cron-6dd99c7b78-kh4rw 1/1 Running 0 9m40s
db-86bbb69b54-nxz4z 1/1 Running 0 9m40s
db-migrate-46715e4-nf5dh 0/1 Completed 0 9m40s
ds-cdh-client-578fc69b9-hcqv7 1/1 Running 0 9m38s
ds-operator-66ff9c4b78-r6bk7 1/1 Running 0 9m38s
ds-reconciler-64bccdd574-vhhpz 0/1 CrashLoopBackOff 4 9m38s
ds-vfs-648f9bddf-7n6hg 1/1 Running 0 9m38s
image-puller-xwbcs 1/1 Running 0 9m39s
ingress-controller-7c8bb7898b-2k4zk 1/1 Running 0 9m40s
livelog-65f9846677-5j8js 1/1 Running 0 9m40s
livelog-publisher-8c89k 1/1 Running 2 9m39s
s2i-builder-b775b49c4-62t56 1/1 Running 0 9m37s
s2i-builder-b775b49c4-vzmwb 1/1 Running 0 9m37s
s2i-builder-b775b49c4-w7hck 1/1 Running 0 9m37s
s2i-client-5658cd7456-8shrb 1/1 Running 0 9m37s
s2i-git-server-6d7d8ccdd8-k4srm 1/1 Running 0 9m40s
s2i-queue-6c699c5f9b-csxns 1/1 Running 0 9m40s
s2i-registry-5db6db6859-rjlkz 1/1 Running 0 9m40s
s2i-registry-auth-76cb9bf597-gmbxp 1/1 Running 0 9m40s
s2i-server-58db56cf6-7t5fb 1/1 Running 0 9m40s
secret-generator-7ffbcd8fcb-wn58c 1/1 Running 0 9m39s
spark-port-forwarder-bfvdm 1/1 Running 0 9m39s
tcp-ingress-controller-7d65df6d8f-qqxzn 0/1 CrashLoopBackOff 5 9m39s
web-5dddb59597-b7sp5 1/1 Running 1 9m39s
web-5dddb59597-m88qn 1/1 Running 1 9m39s
web-5dddb59597-pclf2 0/1 Error 6 9m39s

 

Events:
Type Reason Age From Message
---- ------ ---- ---- -------

Normal Scheduled 12m default-scheduler Successfully assigned default/web-5dddb59597-pclf2 to xyz.copration

Warning FailedMount 11m (x3 over 12m) kubelet, xyz.copration MountVolume.SetUp failed for volume "ds-operator-crt" : secrets "ds-operator-tls" not found

Warning FailedMount 11m (x3 over 12m) kubelet, xyz.copration MountVolume.SetUp failed for volume "tcp-ingress-controller-crt" : secrets "tcp-ingress-controller-tls" not found

Warning FailedMount 11m (x4 over 12m) kubelet, xyz.copration MountVolume.SetUp failed for volume "s2i-server-crt" : secrets "s2i-server-tls" not found

Warning FailedMount 11m (x4 over 12m) kubelet, xyz.copration MountVolume.SetUp failed for volume "s2i-client-crt" : secrets "s2i-client-tls" not found

Warning FailedMount 11m (x4 over 12m) kubelet, xyz.copration MountVolume.SetUp failed for volume "ds-vfs-crt" : secrets "ds-vfs-tls" not found

Warning FailedMount 11m (x3 over 12m) kubelet, xyz.copration MountVolume.SetUp failed for volume "web-tls" : secrets "web-tls" not found

Warning FailedMount 11m (x4 over 12m) kubelet, xyz.copration MountVolume.SetUp failed for volume "ds-cdh-client-crt" : secrets "ds-cdh-client-tls" not found

Normal Pulled 7m19s (x5 over 9m24s) kubelet, xyz.copration Container image "docker-registry.infra.cloudera.com/cdsw/web:46715e4" already present on machine

Warning BackOff 2m23s (x33 over 9m2s) kubelet, xyz.copration Back-off restarting failed container

1 ACCEPTED SOLUTION

avatar
Master Collaborator

@SrJay it looks like you are running CDSW 1.6 on VMWare host which has ipv6 disabled. Can you please confirm by reviewing the dmesg for words cmdline and segfaults?  If you see segmentation faults for node process and if the Cmdline shows ipv6.disabled=1, then you are likely hitting a known issue which is seen with a combination of node.js version 10.x, grpc, and ipv6  The workaround for this is to enable ipv6 on all the hosts running CDSW using the following RedHat article https://access.redhat.com/solutions/8709#rhel7enable 

View solution in original post

1 REPLY 1

avatar
Master Collaborator

@SrJay it looks like you are running CDSW 1.6 on VMWare host which has ipv6 disabled. Can you please confirm by reviewing the dmesg for words cmdline and segfaults?  If you see segmentation faults for node process and if the Cmdline shows ipv6.disabled=1, then you are likely hitting a known issue which is seen with a combination of node.js version 10.x, grpc, and ipv6  The workaround for this is to enable ipv6 on all the hosts running CDSW using the following RedHat article https://access.redhat.com/solutions/8709#rhel7enable