Member since
02-08-2023
5
Posts
0
Kudos Received
0
Solutions
07-17-2024
11:25 PM
Has anyone encountered error 255 before? ie engine exited with error 255
... View more
07-25-2023
06:49 PM
we're not certain the cause of it, but we kubectl cordon off the node and rebooted it. This issue happens sporadically. we are also running on 1.5 CML now
... View more
02-23-2023
07:35 PM
I've raised a support case thread instead, will not be following up on this thread.
... View more
02-22-2023
10:17 PM
Just an update: In ECS, I see that the pods are being assigned to a particular node and they're "stuck" with a message: "failed to create pod sandbox: rpc error:code= unknown desc = failed to get sandbox image "index.docker.io/rancher/pause:3.6" failed to pull image "index.docker.rancherpause:3.6": failed to pull and unpack image "docker.io/rancher/pause:3.6".....
... View more
02-20-2023
06:33 AM
Hi, I am unable to create a support case for this, as my support license does not appear to cover this ML Experiments. I am evaluating Cloudera CML on my company's CDP instance. I created several Experiments on a project to test out the feature (it was a simple script generating some random numbers for metrics) and the experiment created and ran ok. However, I tried to create the same experiment later and the UI just timed out. I used the developer's tools in the browser to see the request and "runs" failed with 504 timeout error. I am unable to troubleshoot the issue further in the UI, and am uncertain if ECS would have any useful information. I could get it to run successfully a few hours later. To my knowledge, nobody touched the CDP cluster during in between this period. During the period I cannot create experiments, I looked at my kubernetes namespace <CML workspace name - userXX> inside ECS, and did not see a pod starting. There was also no events being created. We would like to better understand the product before productionizing it. Could anyone advise where I might look at to debug the issue? Thank you. The version we're using is CML 1.4 on CDP, in an intranet network.
... View more
Labels:
- Labels:
-
Cloudera Machine Learning (CML)