Member since
02-08-2023
3
Posts
0
Kudos Received
0
Solutions
02-23-2023
07:35 PM
I've raised a support case thread instead, will not be following up on this thread.
... View more
02-22-2023
10:17 PM
Just an update: In ECS, I see that the pods are being assigned to a particular node and they're "stuck" with a message: "failed to create pod sandbox: rpc error:code= unknown desc = failed to get sandbox image "index.docker.io/rancher/pause:3.6" failed to pull image "index.docker.rancherpause:3.6": failed to pull and unpack image "docker.io/rancher/pause:3.6".....
... View more
02-20-2023
06:33 AM
Hi, I am unable to create a support case for this, as my support license does not appear to cover this ML Experiments. I am evaluating Cloudera CML on my company's CDP instance. I created several Experiments on a project to test out the feature (it was a simple script generating some random numbers for metrics) and the experiment created and ran ok. However, I tried to create the same experiment later and the UI just timed out. I used the developer's tools in the browser to see the request and "runs" failed with 504 timeout error. I am unable to troubleshoot the issue further in the UI, and am uncertain if ECS would have any useful information. I could get it to run successfully a few hours later. To my knowledge, nobody touched the CDP cluster during in between this period. During the period I cannot create experiments, I looked at my kubernetes namespace <CML workspace name - userXX> inside ECS, and did not see a pod starting. There was also no events being created. We would like to better understand the product before productionizing it. Could anyone advise where I might look at to debug the issue? Thank you. The version we're using is CML 1.4 on CDP, in an intranet network.
... View more
Labels:
- Labels:
-
Cloudera Machine Learning (CML)