Member since
08-22-2018
79
Posts
11
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
306 | 01-27-2025 07:01 AM | |
818 | 06-27-2024 02:58 AM | |
916 | 01-08-2024 02:22 AM | |
1718 | 06-19-2023 02:41 AM |
01-27-2025
07:01 AM
1 Kudo
I did some testing on my test workspace, below are the steps followed: > suspended the workspace > in AWS console, EC2 Autoscaling group set the min & desired count for the liftie infra node group as 0 > after completion of the update, validated in the CDP management console workspace details, I saw the count on Platform infra went to 0 > in the EC2 auto scaling group, set desired and min count as 2 in the liftie infra node group and once update completed, validated same on Workspace details > resumed the workspace and validated session launch was successful Observations: Indeed, I was able to scale down all the AWS instance to 0 Suggestions: Would not suggest this on PROD or any critical environments [not sure what could be the effect on workspace that run on production scale] In case of non prod, test in thoroughly and "implement on your own risk" Alternates: Why do you consider backup and restore operations? When a workspace was not required, instead of suspending and manually scaling down the nodes can you consider backup the workspace and restore the backup when required. Anyhow, since you are planning for a complete shutdown it won't be a prod workspace. Backup/Restore could be an alternate option for non-prod workspace and again it is a cost effective option.
... View more
01-27-2025
12:00 AM
1 Kudo
In general, running any commands on the Cloud Provider console is not recommended as it could case the Cloud instances and CDP control plane out of sync. In case of CML, a liftie/Platform infra instance group with 2 m5.large and an m5.2xlarge on CAI Infra instance group will be running even the workspace is suspended status. Stopping those nodes manually are not tested and it could result into unexpected scenario.
... View more
01-16-2025
12:35 AM
"cml could not fetch the image metadata" indicate that runtime manager could not fetch the image details. Check #1 if the regcred secret is in place or secret is valid [1] #2 if the network route to the registry is open "http:server gave response to HTTPS client" are the protocols of cml workspace[tls/non-tls] and registry matching? Checking the runtime manager pods may give more insights. [1] https://docs.cloudera.com/machine-learning/cloud/runtimes/topics/ml-add-docker-registry-credentials-runtimes.html
... View more
01-14-2025
07:48 PM
@MID_ACN there is no straightforward way to do it. Option#1: Forking [through UI] https://community.cloudera.com/t5/Internal/How-to-change-the-owner-of-the-project-in-CML/ta-p/360168 Option#2: Projects table update [requires cli access to the underlying pod] step1: connect to postgres DB kubectl exec -it $(kubectl get pods -l role=db -o jsonpath='{.items[*].metadata.name}') -- psql -P pager=off --expanded -U sense step2: get user IDs, project ID select id from users where username='<old_owner>'; select id from users where username='<new_owner>'; select id,creator_id,user_id from projects where name='<project_name>'; step#3: update projects table select id,name,user_id,creator_id from projects where project_id='<project_id>'; #validate before update] update projects set user_id='new_owner_id', creator_id='new_owner_id' where project_id='<project_id>'; #update
... View more
01-03-2025
05:17 AM
1 Kudo
you may consider having a custom built pbj runtime image as per your requirements.
... View more
01-03-2025
05:15 AM
1 Kudo
1) When is V1 getting depreciated? it is already https://docs.cloudera.com/machine-learning/cloud/jobs-pipelines/topics/ml-rest-apis.html -- The Jobs API is now deprecated. See Cloudera AI API v2 and API v2 usage for the successor API. -- Is it recommended to use V1 API now? Obviously, No. 2) Which API method is to be used in the Version 2 API call to start the job? https://docs.cloudera.com/machine-learning/1.5.4/rest-api-reference/index.html#api-CMLService-createJobRun may be createjob, createjobrun are confusing
... View more
11-12-2024
06:35 PM
@paulfg Application getting stuck at "starting" status can occur when underlying script execution has not completed or the process flow does not think it had completed what it supposed to. Please check/share the "Application logs" pane for any error/concerning events.
... View more
08-18-2024
10:11 PM
Pyspark 3.5.2 - python >= 3.8 and <=3.11 ref: https://pypi.org/project/pyspark/3.5.2/
... View more
07-12-2024
01:09 AM
1 Kudo
exit code -1 could be due to any corner case during the runtime. Besides, the terminologies[CDSW app/service/job] being used are confusing. What is the workload? [Is it a CDSW job/CDSW application]
... View more
06-27-2024
02:58 AM
1 Kudo
@littlecong The files need to be uploaded to the individual project. As of now there is no documented provision to share contents between the projects.
... View more