Member since
10-18-2023
30
Posts
19
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
788 | 04-11-2024 09:23 PM | |
408 | 02-19-2024 04:12 PM |
07-14-2024
07:13 PM
1 Kudo
Which configuration should I change for set specific NIC for HDFS data-node data replication? I want to separate traffic for Service between Replication.
... View more
Labels:
- Labels:
-
HDFS
05-13-2024
06:34 PM
1 Kudo
I have got the reason why because of certification problem of runtime manager pod. fetch image metadata failed data = {"err":"parseImageSourceFailed : error pinging docker registry harbor.hadoop.com: Get \"https://harbor.hadoop.com/v2/\": x509: certificate signed by unknown authority","url":"harbor.hadoop.com/cloudera/cdsw/ml-runtime-jupyterlab-python3.11-standard:2024.02.1-b4"} CA already registered on system. I created the secret "regcred". I put the file "k8s-secret-regcred.yaml" on /etc/cdsw/patches. I entered CA into Root CA Configuration of [Site-administration] - [Security] How can I solve this issue?
... View more
05-09-2024
06:51 PM
1 Kudo
I tried steps below. 1. Installed harbor. (aaa.bbb.ccc) 2. Created new image use the dockerfile and command below -------- FROM aaa.bbb.ccc/cloudera/cdsw/ml-runtime-jupyterlab-python3.10-standard:2023.08.2-b8 RUN apt-get update && apt-get upgrade -y && apt-get clean && rm -rf /var/lib/apt/lists/ # Cloudera metadata for CML ENV ML_RUNTIME_EDITION="EDITED RUNTIME" \ ML_RUNTIME_SHORT_VERSION="0.1" \ ML_RUNTIME_MAINTENANCE_VERSION=1 \ ML_RUNTIME_DESCRIPTION="EDITED RUNTIME to test" ENV ML_RUNTIME_FULL_VERSION="${ML_RUNTIME_SHORT_VERSION}.${ML_RUNTIME_MAINTENANCE_VERSION}" LABEL com.cloudera.ml.runtime.edition=$ML_RUNTIME_EDITION \ com.cloudera.ml.runtime.full-version=$ML_RUNTIME_FULL_VERSION \ com.cloudera.ml.runtime.short-version=$ML_RUNTIME_SHORT_VERSION \ com.cloudera.ml.runtime.maintenance-version=$ML_RUNTIME_MAINTENANCE_VERSION \ com.cloudera.ml.runtime.description=$ML_RUNTIME_DESCRIPTION --------- docker build -t aaa.bbb.ccc/cloudera/cdsw/ml-runtime-jupyterlab-python3.10-standard:2023.08.2-b8-20240509 . push aaa.bbb.ccc/cloudera/cdsw/ml-runtime-jupyterlab-python3.10-standard:2023.08.2-b8-20240509 3. Created secret kubectl create secret docker-registry regcred --docker-server=aaa.bbb.ccc --docker-username=admin --docker-password=PASSWORD -n default 4. Tried adding the runtime image [Runtime Catalog] - [+Add Runtime] put "aaa.bbb.ccc/cloudera/cdsw/ml-runtime-jupyterlab-python3.10-standard:2023.08.2-b8-20240509" and push [validate] then i faced the message "Could not fetch the image metadata." Which procedure did you miss? To use private registry without using such as harbor, How can I push into it?
... View more
Labels:
04-28-2024
04:58 PM
2 Kudos
Yes, I created custom certs based on custom ca cert. At lease even if on RHEL 8.4, provisioning has succeded if certs are right. Thank you. @upadhyayk04
... View more
04-23-2024
08:51 PM
@upadhyayk04 Same issue happened on all of them (1.5.2, 1.5.3, 1.5.3-h1) I guess one of two kinds of reasons. The first one is related Certification. I think If certification aint right then it happened. (for example server cert doesnt match with ca cert) [Update Ingress Controller Certificate] action failed in this case. And second is DNS. Looks It happened I more than one nameserver lists in /etc/resolv.conf. In this case might related Certificate.
... View more
04-23-2024
05:50 AM
1 Kudo
@upadhyayk04 I have got the reason why because of iscsi problem. I think actually It will not happen in normal cases because no one try to re-install if It installed successfuly. Anyway to solve this problem, I had to delete all related iscsi, and iscsi pacakges before re-install. Thank you.
... View more
04-22-2024
11:16 PM
1 Kudo
Thankyou for your answer. I asked about support 1.5.3 on rhel 8.4
... View more
04-22-2024
08:06 PM
Failed Provisioning while provisioning CML work space. It was succeeded when the first time I tried. However failed provisioning, after re-porovisioning after delete workspace on my test-bed. And also It happened on customer site for POC. I tried version 1.5.2 and 1.5.3, 1.5.3-h1 on RHEL 8.4. Version of Cloudera Manager Cloudera Manager 7.11.3 (#50275000 built by jenkins on 20240213-1404 git: 14e82e253ab970bfd576e4f80d297769a527df18) Tried DataServices Version 1.5.2-b886 / 1.5.3-b279 / 1.5.3-h1-b2 Version of screen shot 1.5.2-b886 I have to update the Data services on customer site. Please give any advices.
... View more
Labels:
- Labels:
-
Cloudera Machine Learning (CML)
04-22-2024
07:48 PM
@upadhyayk04 I have uninstalled and redeployed ECS because of POC due date. I tried many time to install and while doing install I got something if 'Longhorn' directory on with root volume It happened. However It doesnt happened after select another patition. (Actually I do not know what is the real reason.) Thank you for your help. BTW there are no instance directory on node 2. [root@cdppvc2:/var/log]#find . | grep instance ./pods/longhorn-system_instance-manager-e-c4c5839e9e06ae5acde59690c843b7b3_afe1e067-12fe-4241-8183-2d019131630a ./pods/longhorn-system_instance-manager-e-c4c5839e9e06ae5acde59690c843b7b3_afe1e067-12fe-4241-8183-2d019131630a/engine-manager ./pods/longhorn-system_instance-manager-e-c4c5839e9e06ae5acde59690c843b7b3_afe1e067-12fe-4241-8183-2d019131630a/engine-manager/0.log ./pods/longhorn-system_instance-manager-r-c4c5839e9e06ae5acde59690c843b7b3_9f587b54-ae07-4d1f-bdd1-6abc326c0146 ./pods/longhorn-system_instance-manager-r-c4c5839e9e06ae5acde59690c843b7b3_9f587b54-ae07-4d1f-bdd1-6abc326c0146/replica-manager ./pods/longhorn-system_instance-manager-r-c4c5839e9e06ae5acde59690c843b7b3_9f587b54-ae07-4d1f-bdd1-6abc326c0146/replica-manager/0.log ./containers/instance-manager-r-c4c5839e9e06ae5acde59690c843b7b3_longhorn-system_replica-manager-5f3407d236e8ac55a16ddbd819df4f32b2465cd14a627370cd3343efb868fe8b.log ./containers/instance-manager-e-c4c5839e9e06ae5acde59690c843b7b3_longhorn-system_engine-manager-4245b135f65651890f7a26edef834fef65b1c8d2f108f1d0bfe9c3b109a85b06.log
... View more