Member since
10-18-2023
26
Posts
14
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
56 | 04-11-2024 09:23 PM | |
144 | 02-19-2024 04:12 PM |
04-23-2024
08:51 PM
@upadhyayk04 Same issue happened on all of them (1.5.2, 1.5.3, 1.5.3-h1) I guess one of two kinds of reasons. The first one is related Certification. I think If certification aint right then it happened. (for example server cert doesnt match with ca cert) [Update Ingress Controller Certificate] action failed in this case. And second is DNS. Looks It happened I more than one nameserver lists in /etc/resolv.conf. In this case might related Certificate.
... View more
04-23-2024
05:50 AM
1 Kudo
@upadhyayk04 I have got the reason why because of iscsi problem. I think actually It will not happen in normal cases because no one try to re-install if It installed successfuly. Anyway to solve this problem, I had to delete all related iscsi, and iscsi pacakges before re-install. Thank you.
... View more
04-22-2024
11:16 PM
1 Kudo
Thankyou for your answer. I asked about support 1.5.3 on rhel 8.4
... View more
04-22-2024
08:06 PM
Failed Provisioning while provisioning CML work space. It was succeeded when the first time I tried. However failed provisioning, after re-porovisioning after delete workspace on my test-bed. And also It happened on customer site for POC. I tried version 1.5.2 and 1.5.3, 1.5.3-h1 on RHEL 8.4. Version of Cloudera Manager Cloudera Manager 7.11.3 (#50275000 built by jenkins on 20240213-1404 git: 14e82e253ab970bfd576e4f80d297769a527df18) Tried DataServices Version 1.5.2-b886 / 1.5.3-b279 / 1.5.3-h1-b2 Version of screen shot 1.5.2-b886 I have to update the Data services on customer site. Please give any advices.
... View more
Labels:
- Labels:
-
Cloudera Machine Learning (CML)
04-22-2024
07:48 PM
@upadhyayk04 I have uninstalled and redeployed ECS because of POC due date. I tried many time to install and while doing install I got something if 'Longhorn' directory on with root volume It happened. However It doesnt happened after select another patition. (Actually I do not know what is the real reason.) Thank you for your help. BTW there are no instance directory on node 2. [root@cdppvc2:/var/log]#find . | grep instance ./pods/longhorn-system_instance-manager-e-c4c5839e9e06ae5acde59690c843b7b3_afe1e067-12fe-4241-8183-2d019131630a ./pods/longhorn-system_instance-manager-e-c4c5839e9e06ae5acde59690c843b7b3_afe1e067-12fe-4241-8183-2d019131630a/engine-manager ./pods/longhorn-system_instance-manager-e-c4c5839e9e06ae5acde59690c843b7b3_afe1e067-12fe-4241-8183-2d019131630a/engine-manager/0.log ./pods/longhorn-system_instance-manager-r-c4c5839e9e06ae5acde59690c843b7b3_9f587b54-ae07-4d1f-bdd1-6abc326c0146 ./pods/longhorn-system_instance-manager-r-c4c5839e9e06ae5acde59690c843b7b3_9f587b54-ae07-4d1f-bdd1-6abc326c0146/replica-manager ./pods/longhorn-system_instance-manager-r-c4c5839e9e06ae5acde59690c843b7b3_9f587b54-ae07-4d1f-bdd1-6abc326c0146/replica-manager/0.log ./containers/instance-manager-r-c4c5839e9e06ae5acde59690c843b7b3_longhorn-system_replica-manager-5f3407d236e8ac55a16ddbd819df4f32b2465cd14a627370cd3343efb868fe8b.log ./containers/instance-manager-e-c4c5839e9e06ae5acde59690c843b7b3_longhorn-system_engine-manager-4245b135f65651890f7a26edef834fef65b1c8d2f108f1d0bfe9c3b109a85b06.log
... View more
04-13-2024
05:22 PM
1 Kudo
@upadhyayk04 logs -f -n longhorn-system longhorn-csi-plugin-cxglq Defaulted container "node-driver-registrar" out of: node-driver-registrar, longhorn-liveness-probe, longhorn-csi-plugin I0413 09:50:20.091344 290593 main.go:166] Version: v2.5.0 I0413 09:50:20.091369 290593 main.go:167] Running node-driver-registrar in mode=registration I0413 09:50:20.092527 290593 main.go:191] Attempting to open a gRPC connection with: "/csi/csi.sock" I0413 09:50:21.093286 290593 main.go:198] Calling CSI driver to discover driver name I0413 09:50:21.094471 290593 main.go:208] CSI driver name: "driver.longhorn.io" I0413 09:50:21.094497 290593 node_register.go:53] Starting Registration Server at: /registration/driver.longhorn.io-reg.sock I0413 09:50:21.094656 290593 node_register.go:62] Registration Server started at: /registration/driver.longhorn.io-reg.sock I0413 09:50:21.094779 290593 node_register.go:92] Skipping HTTP server because endpoint is set to: "" I0413 09:50:21.466617 290593 main.go:102] Received GetInfo call: &InfoRequest{} I0413 09:50:21.466820 290593 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/driver.longhorn.io/registration" I0413 09:50:23.205994 290593 main.go:120] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
... View more
04-12-2024
06:30 PM
@upadhyayk04 vault-0 pod goes terminating and containercreating status again again again. because of volume attach faild Warning FailedAttachVolume 108s attachdetach-controller AttachVolume.Attach failed for volume "pvc-33f9624d-4d90-48fa-8469-02a104df1d10" : rpc error: code = DeadlineExceeded desc = volume pvc-33f9624d-4d90-48fa-8469-02a104df1d10 failed to attach to node cdppvc2.hadoop.com with
... View more
04-12-2024
12:30 AM
1 Kudo
@upadhyayk04 Look all pods are fine [root@cdppvc1 ~]# k get ns NAME STATUS AGE default Active 4h56m ecs-webhooks Active 4h55m kube-node-lease Active 4h56m kube-public Active 4h56m kube-system Active 4h56m local-path-storage Active 4h55m longhorn-system Active 4h55m vault-system Active 116s [root@cdppvc1 ~]# k get pod -A NAMESPACE NAME READY STATUS RESTARTS AGE ecs-webhooks ecs-tolerations-webhook-77d857599d-b8hsh 1/1 Running 0 39m ecs-webhooks ecs-tolerations-webhook-77d857599d-h6qxk 1/1 Running 0 39m kube-system etcd-cdppvc1.hadoop.com 1/1 Running 1 4h54m kube-system helm-install-rke2-ingress-nginx-mk845 0/1 Completed 0 10m kube-system kube-apiserver-cdppvc1.hadoop.com 1/1 Running 1 4h54m kube-system kube-controller-manager-cdppvc1.hadoop.com 1/1 Running 3 (89m ago) 4h54m kube-system kube-proxy-cdppvc1.hadoop.com 1/1 Running 0 86m kube-system kube-proxy-cdppvc2.hadoop.com 1/1 Running 0 4h53m kube-system kube-scheduler-cdppvc1.hadoop.com 1/1 Running 1 (90m ago) 4h54m kube-system rke2-canal-9h5hh 2/2 Running 0 4h53m kube-system rke2-canal-qk2wg 2/2 Running 2 (90m ago) 4h53m kube-system rke2-coredns-rke2-coredns-565dfc7d75-djp4t 1/1 Running 0 38m kube-system rke2-coredns-rke2-coredns-565dfc7d75-gvxcj 1/1 Running 0 153m kube-system rke2-coredns-rke2-coredns-autoscaler-6c48c95bf9-7ln92 1/1 Running 0 39m kube-system rke2-ingress-nginx-controller-869fc5f494-xcz6x 1/1 Running 0 39m kube-system rke2-metrics-server-c9c78bd66-blrwg 1/1 Running 0 156m kube-system rke2-snapshot-controller-6f7bbb497d-wk5mg 1/1 Running 0 39m kube-system rke2-snapshot-validation-webhook-65b5675d5c-7fst2 1/1 Running 0 39m local-path-storage local-path-provisioner-6b8fcdf4f9-fqqnw 1/1 Running 0 155m longhorn-system csi-attacher-5f79c59664-gsfc4 1/1 Running 0 156m longhorn-system csi-attacher-5f79c59664-rppmd 1/1 Running 0 156m longhorn-system csi-attacher-5f79c59664-spmmt 1/1 Running 1 (93m ago) 156m longhorn-system csi-provisioner-7f9fff657d-mvmb6 1/1 Running 0 156m longhorn-system csi-provisioner-7f9fff657d-r76kv 1/1 Running 1 (93m ago) 156m longhorn-system csi-provisioner-7f9fff657d-wm77w 1/1 Running 0 156m longhorn-system csi-resizer-7667995d7-fgkbd 1/1 Running 0 156m longhorn-system csi-resizer-7667995d7-rn5ts 1/1 Running 1 (93m ago) 156m longhorn-system csi-resizer-7667995d7-zx94l 1/1 Running 0 156m longhorn-system csi-snapshotter-56954ddc99-b44ds 1/1 Running 0 156m longhorn-system csi-snapshotter-56954ddc99-fmw8x 1/1 Running 1 (93m ago) 156m longhorn-system csi-snapshotter-56954ddc99-jkwhv 1/1 Running 0 156m longhorn-system engine-image-ei-6b4330bf-nnwmm 1/1 Running 0 4h52m longhorn-system engine-image-ei-6b4330bf-npf9k 1/1 Running 1 (90m ago) 4h52m longhorn-system instance-manager-12ec73857d1e3aea875a32230969da75 1/1 Running 0 38m longhorn-system instance-manager-ad30a9ee514d3e836de7c5077cfe5ca6 1/1 Running 0 153m longhorn-system longhorn-csi-plugin-j5xw4 3/3 Running 0 4h51m longhorn-system longhorn-csi-plugin-v7bdh 3/3 Running 6 (86m ago) 4h51m longhorn-system longhorn-driver-deployer-75c7cb9999-v8xgb 1/1 Running 0 156m longhorn-system longhorn-manager-d495r 1/1 Running 1 (90m ago) 4h52m longhorn-system longhorn-manager-nvgk7 1/1 Running 0 4h52m longhorn-system longhorn-ui-64c4bfff54-d6c7n 1/1 Running 0 156m longhorn-system longhorn-ui-64c4bfff54-vrx4q 1/1 Running 0 156m
... View more
04-11-2024
11:29 PM
1 Kudo
@upadhyayk04 Podd list [root@cdppvc1 ~]# k get pod -n longhorn-system NAME READY STATUS RESTARTS AGE csi-attacher-5f79c59664-gsfc4 1/1 Running 0 96m csi-attacher-5f79c59664-rppmd 1/1 Running 0 96m csi-attacher-5f79c59664-spmmt 1/1 Running 1 (34m ago) 96m csi-provisioner-7f9fff657d-mvmb6 1/1 Running 0 96m csi-provisioner-7f9fff657d-r76kv 1/1 Running 1 (34m ago) 96m csi-provisioner-7f9fff657d-wm77w 1/1 Running 0 96m csi-resizer-7667995d7-fgkbd 1/1 Running 0 97m csi-resizer-7667995d7-rn5ts 1/1 Running 1 (34m ago) 97m csi-resizer-7667995d7-zx94l 1/1 Running 0 97m csi-snapshotter-56954ddc99-b44ds 1/1 Running 0 97m csi-snapshotter-56954ddc99-fmw8x 1/1 Running 1 (34m ago) 97m csi-snapshotter-56954ddc99-jkwhv 1/1 Running 0 97m engine-image-ei-6b4330bf-nnwmm 1/1 Running 0 3h52m engine-image-ei-6b4330bf-npf9k 1/1 Running 1 (30m ago) 3h52m instance-manager-12ec73857d1e3aea875a32230969da75 1/1 Running 0 34m instance-manager-ad30a9ee514d3e836de7c5077cfe5ca6 1/1 Running 0 94m longhorn-csi-plugin-j5xw4 3/3 Running 0 3h52m longhorn-csi-plugin-v7bdh 3/3 Running 6 (26m ago) 3h52m longhorn-driver-deployer-75c7cb9999-v8xgb 1/1 Running 0 96m longhorn-manager-d495r 1/1 Running 1 (30m ago) 3h52m longhorn-manager-nvgk7 1/1 Running 0 3h52m longhorn-ui-64c4bfff54-d6c7n 1/1 Running 0 97m longhorn-ui-64c4bfff54-vrx4q 1/1 Running 0 97m describe pod csi-plugin Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SandboxChanged 28m (x5 over 30m) kubelet Pod sandbox changed, it will be killed and re-created. Normal Started 28m kubelet Started container longhorn-liveness-probe Normal Created 28m kubelet Created container node-driver-registrar Normal Started 28m kubelet Started container node-driver-registrar Normal Pulled 28m kubelet Container image "registry.ecs.internal/cloudera_thirdparty/longhornio/livenessprobe:v2.12.0" already present on machine Normal Created 28m kubelet Created container longhorn-liveness-probe Normal Pulled 28m kubelet Container image "registry.ecs.internal/cloudera_thirdparty/longhornio/csi-node-driver-registrar:v2.9.2" already present on machine Warning BackOff 28m (x2 over 28m) kubelet Back-off restarting failed container longhorn-csi-plugin in pod longhorn-csi-plugin-v7bdh_longhorn-system(4fe460af-df96-4006-a631-dcc21bd46a07) Normal Pulled 28m (x2 over 28m) kubelet Container image "registry.ecs.internal/cloudera_thirdparty/longhornio/longhorn-manager:v1.5.4" already present on machine Normal Created 28m (x2 over 28m) kubelet Created container longhorn-csi-plugin Normal Started 28m (x2 over 28m) kubelet Started container longhorn-csi-plugin Warning Unhealthy 27m (x3 over 28m) kubelet Liveness probe failed: Get "http://10.42.0.6:9808/healthz": dial tcp 10.42.0.6:9808: connect: connection refused Normal Killing 27m kubelet Container longhorn-csi-plugin failed liveness probe, will be restarted Warning BackOff 27m (x2 over 28m) kubelet Back-off restarting failed container node-driver-registrar in pod longhorn-csi-plugin-v7bdh_longhorn-system(4fe460af-df96-4006-a631-dcc21bd46a07) log of csi-plugin pod [root@cdppvc1 ~]# k logs -f longhorn-csi-plugin-v7bdh -n longhorn-system Defaulted container "node-driver-registrar" out of: node-driver-registrar, longhorn-liveness-probe, longhorn-csi-plugin I0412 06:02:45.498503 12176 main.go:135] Version: v2.9.2 I0412 06:02:45.498547 12176 main.go:136] Running node-driver-registrar in mode= I0412 06:02:45.498553 12176 main.go:157] Attempting to open a gRPC connection with: "/csi/csi.sock" W0412 06:02:55.498699 12176 connection.go:232] Still connecting to unix:///csi/csi.sock I0412 06:03:00.414873 12176 main.go:164] Calling CSI driver to discover driver name I0412 06:03:00.417352 12176 main.go:173] CSI driver name: "driver.longhorn.io" I0412 06:03:00.417373 12176 node_register.go:55] Starting Registration Server at: /registration/driver.longhorn.io-reg.sock I0412 06:03:00.417530 12176 node_register.go:64] Registration Server started at: /registration/driver.longhorn.io-reg.sock I0412 06:03:00.417667 12176 node_register.go:88] Skipping HTTP server because endpoint is set to: "" I0412 06:03:01.396603 12176 main.go:90] Received GetInfo call: &InfoRequest{} I0412 06:03:01.402598 12176 main.go:101] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
... View more