Support Questions

skipme · ‎02-20-2023

Hi,

I am unable to create a support case for this, as my support license does not appear to cover this ML Experiments.

I am evaluating Cloudera CML on my company's CDP instance. I created several Experiments on a project to test out the feature (it was a simple script generating some random numbers for metrics) and the experiment created and ran ok. However, I tried to create the same experiment later and the UI just timed out. I used the developer's tools in the browser to see the request and "runs" failed with 504 timeout error. I am unable to troubleshoot the issue further in the UI, and am uncertain if ECS would have any useful information. I could get it to run successfully a few hours later. To my knowledge, nobody touched the CDP cluster during in between this period. During the period I cannot create experiments, I looked at my kubernetes namespace <CML workspace name - userXX> inside ECS, and did not see a pod starting. There was also no events being created. We would like to better understand the product before productionizing it. Could anyone advise where I might look at to debug the issue? Thank you.

The version we're using is CML 1.4 on CDP, in an intranet network.

skipme · ‎02-22-2023

Just an update: In ECS, I see that the pods are being assigned to a particular node and they're "stuck" with a message:

"failed to create pod sandbox: rpc error:code= unknown desc = failed to get sandbox image "index.docker.io/rancher/pause:3.6" failed to pull image "index.docker.rancherpause:3.6": failed to pull and unpack image "docker.io/rancher/pause:3.6".....

skipme · ‎02-23-2023

I've raised a support case thread instead, will not be following up on this thread.

IvanCe · ‎07-24-2023

Hello,
I have the same issue. Have you got a solution?

skipme · ‎07-25-2023

we're not certain the cause of it, but we kubectl cordon off the node and rebooted it. This issue happens sporadically. we are also running on 1.5 CML now

Cloudera Community

Support Questions

Unable to create CML Experiment - how to debug