Support Questions

Hae · ‎04-11-2024

I faced failed Initialize embedded Vault shile installing DataServices.

It happened everytime evenif I try install again from start if it happened once.

I tried install on system below

- Red Hat Enterprise Linux release 8.4 (Ootpa)

- Cloudera Manager 7.11.3 (#50275000 built by jenkins on 20240213-1404 git: 14e82e253ab970bfd576e4f80d297769a527df18)

- 1.5.2-b886-ecs-1.5.2-b886.p0.46792599 / 1.5.3-b297-ecs-1.5.3-b297.p0.50802651 both I tried

stdout

Fri Apr 12 11:36:52 KST 2024
Running on: cdppvc1.hostname.com (192.168.10.10)
JAVA_HOME=/usr/lib/jvm/java-openjdk
using /usr/lib/jvm/java-openjdk as JAVA_HOME
namespace/vault-system created
helmchart.helm.cattle.io/vault created
certificatesigningrequest.certificates.k8s.io/vault-csr created
certificatesigningrequest.certificates.k8s.io/vault-csr approved
secret/vault-server-tls created
secret/ingress-cert created
helmchart.helm.cattle.io/vault unchanged
Wait 30 seconds for startup
...
Timed out waiting for vault to come up

stderr

++ kubectl exec vault-0 -n vault-system -- vault operator init -tls-skip-verify -key-shares=1 -key-threshold=1 -format=json
error: unable to upgrade connection: container not found ("vault")
++ '[' 600 -gt 600 ']'
++ echo ...
++ sleep 10
++ time_elapsed=610
++ kubectl exec vault-0 -n vault-system -- vault operator init -tls-skip-verify -key-shares=1 -key-threshold=1 -format=json
error: unable to upgrade connection: container not found ("vault")
++ '[' 610 -gt 600 ']'
++ echo 'Timed out waiting for vault to come up'
++ exit 1

describe pod

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 108s default-scheduler Successfully assigned vault-system/vault-0 to cdppvc2.hostname.com
Warning FailedAttachVolume 108s attachdetach-controller AttachVolume.Attach failed for volume "pvc-33f9624d-4d90-48fa-8469-02a104df1d10" : rpc error: code = DeadlineExceeded desc = volume pvc-33f9624d-4d90-48fa-8469-02a104df1d10 failed to attach to node cdppvc2.hadoop.com with attachmentID csi-b57965889e8c6c2de7ffd7d045d52175b3415fa69c5e09d1cadc9c7ac1e5a467

upadhyayk04 · ‎04-22-2024

Hello @Hae

Appolgies for the delay as I was unavailable for some time

Let's check the volume logs on the cdppvc2 node under the below location

# /var/log/instances/pvc-33f9624d-4d90-48fa-8469-02a104df1d10.log

View solution in original post

upadhyayk04 · ‎04-22-2024

Hello @Hae

Glad to know that the issue is fixed

For me, the log file is present below

# pwd
/var/lib/kubelet/pods/c2fa4324-b324-40c5-97a6-e55bd7fa1a65/volumes/kubernetes.io~csi/pvc-697a0f20-499f-4896-b6e9-a5e87435db9b/mount
[CDP-DS Tue Apr 23 04:32:08 UTC root@pvc-ds-readiness05.novalocal [/var/lib/kubelet/pods/c2fa4324-b324-40c5-97a6-e55bd7fa1a65/volumes/kubernetes.io~csi/pvc-697a0f20-499f-4896-b6e9-a5e87435db9b/mount]
# ls -lrth
total 99M
drwxrws---. 2 root  28536  16K Mar 12 09:08 lost+found
-rw-rw-r--. 1 28536 28536    0 Mar 12 09:08 LOCK
-rw-rw-r--. 1 28536 28536   37 Mar 12 09:08 IDENTITY
-rw-rw-r--. 1 28536 28536  11M Apr  3 10:38 LOG.old.1712143774120001
-rw-rw-r--. 1 28536 28536 482K Apr  4 07:59 LOG.old.1712218718445033
-rw-rw-r--. 1 28536 28536 5.9M Apr 15 06:18 LOG.old.1713163409204237
-rw-rw-r--. 1 28536 28536  40K Apr 15 07:43 LOG.old.1713167051095602
-rw-rw-r--. 1 28536 28536 4.7K Apr 15 07:44 OPTIONS-000017
-rw-rw-r--. 1 28536 28536 2.4M Apr 15 07:44 000018.sst
-rw-rw-r--. 1 28536 28536 559K Apr 16 05:44 LOG.old.1713246769612940
-rw-rw-r--. 1 28536 28536 4.8K Apr 16 05:52 000020.sst
-rw-rw-r--. 1 28536 28536  185 Apr 16 05:52 MANIFEST-000021
-rw-rw-r--. 1 28536 28536   16 Apr 16 05:52 CURRENT
-rw-rw-r--. 1 28536 28536 4.7K Apr 16 05:52 OPTIONS-000024
-rw-rw-r--. 1 28536 28536 2.0K Apr 16 07:20 000022.log
-rw-rw-r--. 1 28536 28536 4.1M Apr 23 04:22 LOG

View solution in original post

upadhyayk04 · ‎04-22-2024

Hello @Hae

Appolgies for the delay as I was unavailable for some time

Let's check the volume logs on the cdppvc2 node under the below location

# /var/log/instances/pvc-33f9624d-4d90-48fa-8469-02a104df1d10.log

Hae · ‎04-22-2024

@upadhyayk04

I have uninstalled and redeployed ECS because of POC due date.

I tried many time to install and while doing install

I got something if 'Longhorn' directory on with root volume It happened.

However It doesnt happened after select another patition.

(Actually I do not know what is the real reason.)

Thank you for your help.

BTW there are no instance directory on node 2.

[root@cdppvc2:/var/log]#find . | grep instance
./pods/longhorn-system_instance-manager-e-c4c5839e9e06ae5acde59690c843b7b3_afe1e067-12fe-4241-8183-2d019131630a
./pods/longhorn-system_instance-manager-e-c4c5839e9e06ae5acde59690c843b7b3_afe1e067-12fe-4241-8183-2d019131630a/engine-manager
./pods/longhorn-system_instance-manager-e-c4c5839e9e06ae5acde59690c843b7b3_afe1e067-12fe-4241-8183-2d019131630a/engine-manager/0.log
./pods/longhorn-system_instance-manager-r-c4c5839e9e06ae5acde59690c843b7b3_9f587b54-ae07-4d1f-bdd1-6abc326c0146
./pods/longhorn-system_instance-manager-r-c4c5839e9e06ae5acde59690c843b7b3_9f587b54-ae07-4d1f-bdd1-6abc326c0146/replica-manager
./pods/longhorn-system_instance-manager-r-c4c5839e9e06ae5acde59690c843b7b3_9f587b54-ae07-4d1f-bdd1-6abc326c0146/replica-manager/0.log
./containers/instance-manager-r-c4c5839e9e06ae5acde59690c843b7b3_longhorn-system_replica-manager-5f3407d236e8ac55a16ddbd819df4f32b2465cd14a627370cd3343efb868fe8b.log
./containers/instance-manager-e-c4c5839e9e06ae5acde59690c843b7b3_longhorn-system_engine-manager-4245b135f65651890f7a26edef834fef65b1c8d2f108f1d0bfe9c3b109a85b06.log

upadhyayk04 · ‎04-22-2024

Hello @Hae

Glad to know that the issue is fixed

For me, the log file is present below

# pwd
/var/lib/kubelet/pods/c2fa4324-b324-40c5-97a6-e55bd7fa1a65/volumes/kubernetes.io~csi/pvc-697a0f20-499f-4896-b6e9-a5e87435db9b/mount
[CDP-DS Tue Apr 23 04:32:08 UTC root@pvc-ds-readiness05.novalocal [/var/lib/kubelet/pods/c2fa4324-b324-40c5-97a6-e55bd7fa1a65/volumes/kubernetes.io~csi/pvc-697a0f20-499f-4896-b6e9-a5e87435db9b/mount]
# ls -lrth
total 99M
drwxrws---. 2 root  28536  16K Mar 12 09:08 lost+found
-rw-rw-r--. 1 28536 28536    0 Mar 12 09:08 LOCK
-rw-rw-r--. 1 28536 28536   37 Mar 12 09:08 IDENTITY
-rw-rw-r--. 1 28536 28536  11M Apr  3 10:38 LOG.old.1712143774120001
-rw-rw-r--. 1 28536 28536 482K Apr  4 07:59 LOG.old.1712218718445033
-rw-rw-r--. 1 28536 28536 5.9M Apr 15 06:18 LOG.old.1713163409204237
-rw-rw-r--. 1 28536 28536  40K Apr 15 07:43 LOG.old.1713167051095602
-rw-rw-r--. 1 28536 28536 4.7K Apr 15 07:44 OPTIONS-000017
-rw-rw-r--. 1 28536 28536 2.4M Apr 15 07:44 000018.sst
-rw-rw-r--. 1 28536 28536 559K Apr 16 05:44 LOG.old.1713246769612940
-rw-rw-r--. 1 28536 28536 4.8K Apr 16 05:52 000020.sst
-rw-rw-r--. 1 28536 28536  185 Apr 16 05:52 MANIFEST-000021
-rw-rw-r--. 1 28536 28536   16 Apr 16 05:52 CURRENT
-rw-rw-r--. 1 28536 28536 4.7K Apr 16 05:52 OPTIONS-000024
-rw-rw-r--. 1 28536 28536 2.0K Apr 16 07:20 000022.log
-rw-rw-r--. 1 28536 28536 4.1M Apr 23 04:22 LOG

Hae · ‎04-23-2024

@upadhyayk04

I have got the reason why because of iscsi problem.

I think actually It will not happen in normal cases because no one try to re-install if It installed successfuly.

Anyway to solve this problem, I had to delete all related iscsi, and iscsi pacakges before re-install.

Thank you.

Cloudera Community

Support Questions

Failed Initialize embedded Vault while installing DataServices.