Created on 04-11-2024 09:19 PM - edited 04-11-2024 10:07 PM
I faced failed Initialize embedded Vault shile installing DataServices.
It happened everytime evenif I try install again from start if it happened once.
I tried install on system below
- Red Hat Enterprise Linux release 8.4 (Ootpa)
- Cloudera Manager 7.11.3 (#50275000 built by jenkins on 20240213-1404 git: 14e82e253ab970bfd576e4f80d297769a527df18)
- 1.5.2-b886-ecs-1.5.2-b886.p0.46792599 / 1.5.3-b297-ecs-1.5.3-b297.p0.50802651 both I tried
stdout
Fri Apr 12 11:36:52 KST 2024
Running on: cdppvc1.hostname.com (192.168.10.10)
JAVA_HOME=/usr/lib/jvm/java-openjdk
using /usr/lib/jvm/java-openjdk as JAVA_HOME
namespace/vault-system created
helmchart.helm.cattle.io/vault created
certificatesigningrequest.certificates.k8s.io/vault-csr created
certificatesigningrequest.certificates.k8s.io/vault-csr approved
secret/vault-server-tls created
secret/ingress-cert created
helmchart.helm.cattle.io/vault unchanged
Wait 30 seconds for startup
...
Timed out waiting for vault to come up
stderr
++ kubectl exec vault-0 -n vault-system -- vault operator init -tls-skip-verify -key-shares=1 -key-threshold=1 -format=json
error: unable to upgrade connection: container not found ("vault")
++ '[' 600 -gt 600 ']'
++ echo ...
++ sleep 10
++ time_elapsed=610
++ kubectl exec vault-0 -n vault-system -- vault operator init -tls-skip-verify -key-shares=1 -key-threshold=1 -format=json
error: unable to upgrade connection: container not found ("vault")
++ '[' 610 -gt 600 ']'
++ echo 'Timed out waiting for vault to come up'
++ exit 1
describe pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 108s default-scheduler Successfully assigned vault-system/vault-0 to cdppvc2.hostname.com
Warning FailedAttachVolume 108s attachdetach-controller AttachVolume.Attach failed for volume "pvc-33f9624d-4d90-48fa-8469-02a104df1d10" : rpc error: code = DeadlineExceeded desc = volume pvc-33f9624d-4d90-48fa-8469-02a104df1d10 failed to attach to node cdppvc2.hadoop.com with attachmentID csi-b57965889e8c6c2de7ffd7d045d52175b3415fa69c5e09d1cadc9c7ac1e5a467
Created 04-22-2024 10:01 AM
Hello @Hae
Appolgies for the delay as I was unavailable for some time
Let's check the volume logs on the cdppvc2 node under the below location
# /var/log/instances/pvc-33f9624d-4d90-48fa-8469-02a104df1d10.log
Created 04-22-2024 09:33 PM
Hello @Hae
Glad to know that the issue is fixed
For me, the log file is present below
# pwd
/var/lib/kubelet/pods/c2fa4324-b324-40c5-97a6-e55bd7fa1a65/volumes/kubernetes.io~csi/pvc-697a0f20-499f-4896-b6e9-a5e87435db9b/mount
[CDP-DS Tue Apr 23 04:32:08 UTC root@pvc-ds-readiness05.novalocal [/var/lib/kubelet/pods/c2fa4324-b324-40c5-97a6-e55bd7fa1a65/volumes/kubernetes.io~csi/pvc-697a0f20-499f-4896-b6e9-a5e87435db9b/mount]
# ls -lrth
total 99M
drwxrws---. 2 root 28536 16K Mar 12 09:08 lost+found
-rw-rw-r--. 1 28536 28536 0 Mar 12 09:08 LOCK
-rw-rw-r--. 1 28536 28536 37 Mar 12 09:08 IDENTITY
-rw-rw-r--. 1 28536 28536 11M Apr 3 10:38 LOG.old.1712143774120001
-rw-rw-r--. 1 28536 28536 482K Apr 4 07:59 LOG.old.1712218718445033
-rw-rw-r--. 1 28536 28536 5.9M Apr 15 06:18 LOG.old.1713163409204237
-rw-rw-r--. 1 28536 28536 40K Apr 15 07:43 LOG.old.1713167051095602
-rw-rw-r--. 1 28536 28536 4.7K Apr 15 07:44 OPTIONS-000017
-rw-rw-r--. 1 28536 28536 2.4M Apr 15 07:44 000018.sst
-rw-rw-r--. 1 28536 28536 559K Apr 16 05:44 LOG.old.1713246769612940
-rw-rw-r--. 1 28536 28536 4.8K Apr 16 05:52 000020.sst
-rw-rw-r--. 1 28536 28536 185 Apr 16 05:52 MANIFEST-000021
-rw-rw-r--. 1 28536 28536 16 Apr 16 05:52 CURRENT
-rw-rw-r--. 1 28536 28536 4.7K Apr 16 05:52 OPTIONS-000024
-rw-rw-r--. 1 28536 28536 2.0K Apr 16 07:20 000022.log
-rw-rw-r--. 1 28536 28536 4.1M Apr 23 04:22 LOG
Created 04-22-2024 10:01 AM
Hello @Hae
Appolgies for the delay as I was unavailable for some time
Let's check the volume logs on the cdppvc2 node under the below location
# /var/log/instances/pvc-33f9624d-4d90-48fa-8469-02a104df1d10.log
Created on 04-22-2024 07:48 PM - edited 04-22-2024 07:50 PM
I have uninstalled and redeployed ECS because of POC due date.
I tried many time to install and while doing install
I got something if 'Longhorn' directory on with root volume It happened.
However It doesnt happened after select another patition.
(Actually I do not know what is the real reason.)
Thank you for your help.
BTW there are no instance directory on node 2.
[root@cdppvc2:/var/log]#find . | grep instance
./pods/longhorn-system_instance-manager-e-c4c5839e9e06ae5acde59690c843b7b3_afe1e067-12fe-4241-8183-2d019131630a
./pods/longhorn-system_instance-manager-e-c4c5839e9e06ae5acde59690c843b7b3_afe1e067-12fe-4241-8183-2d019131630a/engine-manager
./pods/longhorn-system_instance-manager-e-c4c5839e9e06ae5acde59690c843b7b3_afe1e067-12fe-4241-8183-2d019131630a/engine-manager/0.log
./pods/longhorn-system_instance-manager-r-c4c5839e9e06ae5acde59690c843b7b3_9f587b54-ae07-4d1f-bdd1-6abc326c0146
./pods/longhorn-system_instance-manager-r-c4c5839e9e06ae5acde59690c843b7b3_9f587b54-ae07-4d1f-bdd1-6abc326c0146/replica-manager
./pods/longhorn-system_instance-manager-r-c4c5839e9e06ae5acde59690c843b7b3_9f587b54-ae07-4d1f-bdd1-6abc326c0146/replica-manager/0.log
./containers/instance-manager-r-c4c5839e9e06ae5acde59690c843b7b3_longhorn-system_replica-manager-5f3407d236e8ac55a16ddbd819df4f32b2465cd14a627370cd3343efb868fe8b.log
./containers/instance-manager-e-c4c5839e9e06ae5acde59690c843b7b3_longhorn-system_engine-manager-4245b135f65651890f7a26edef834fef65b1c8d2f108f1d0bfe9c3b109a85b06.log
Created 04-22-2024 09:33 PM
Hello @Hae
Glad to know that the issue is fixed
For me, the log file is present below
# pwd
/var/lib/kubelet/pods/c2fa4324-b324-40c5-97a6-e55bd7fa1a65/volumes/kubernetes.io~csi/pvc-697a0f20-499f-4896-b6e9-a5e87435db9b/mount
[CDP-DS Tue Apr 23 04:32:08 UTC root@pvc-ds-readiness05.novalocal [/var/lib/kubelet/pods/c2fa4324-b324-40c5-97a6-e55bd7fa1a65/volumes/kubernetes.io~csi/pvc-697a0f20-499f-4896-b6e9-a5e87435db9b/mount]
# ls -lrth
total 99M
drwxrws---. 2 root 28536 16K Mar 12 09:08 lost+found
-rw-rw-r--. 1 28536 28536 0 Mar 12 09:08 LOCK
-rw-rw-r--. 1 28536 28536 37 Mar 12 09:08 IDENTITY
-rw-rw-r--. 1 28536 28536 11M Apr 3 10:38 LOG.old.1712143774120001
-rw-rw-r--. 1 28536 28536 482K Apr 4 07:59 LOG.old.1712218718445033
-rw-rw-r--. 1 28536 28536 5.9M Apr 15 06:18 LOG.old.1713163409204237
-rw-rw-r--. 1 28536 28536 40K Apr 15 07:43 LOG.old.1713167051095602
-rw-rw-r--. 1 28536 28536 4.7K Apr 15 07:44 OPTIONS-000017
-rw-rw-r--. 1 28536 28536 2.4M Apr 15 07:44 000018.sst
-rw-rw-r--. 1 28536 28536 559K Apr 16 05:44 LOG.old.1713246769612940
-rw-rw-r--. 1 28536 28536 4.8K Apr 16 05:52 000020.sst
-rw-rw-r--. 1 28536 28536 185 Apr 16 05:52 MANIFEST-000021
-rw-rw-r--. 1 28536 28536 16 Apr 16 05:52 CURRENT
-rw-rw-r--. 1 28536 28536 4.7K Apr 16 05:52 OPTIONS-000024
-rw-rw-r--. 1 28536 28536 2.0K Apr 16 07:20 000022.log
-rw-rw-r--. 1 28536 28536 4.1M Apr 23 04:22 LOG
Created 04-23-2024 05:50 AM
I have got the reason why because of iscsi problem.
I think actually It will not happen in normal cases because no one try to re-install if It installed successfuly.
Anyway to solve this problem, I had to delete all related iscsi, and iscsi pacakges before re-install.
Thank you.