About Mike

Mike · ‎03-20-2023

Hi, the docker repository is available publicly. You can test by running: docker pull docker.repository.cloudera.com/cdsw/engine:13 does that work? Can you paste the actual authentication error you are seeing? Is it an x509 error? Can you paste the contents of /etc/docker/daemon.json?

Mike · ‎02-06-2023

I think you will have to write the output you want to share into an attachment and simply share the attachment only. When you set up a job you can tell it not to send the console output.

Mike · ‎02-06-2023

Hi. When you go to share a session, there is a box to "Hide code and text" - does this work for you? I don't really see a way to do this inside of a scheduled job. I think your best bet is to write whatever important data that you want to share to a file, and then when you set up your job, uncheck the button to share the console and include the attachment. Can you try that?

Mike · ‎02-01-2023

Hi, whichever user you log in as under the User Settings -> Hadoop Authentication should be what CDSW uses when accessing the rest of the cluster. Typically your CDSW username/password is your SSO login, but you can change this Hadoop Authentication settings if you want. Let me know if that works for you!

Mike · ‎02-01-2023

Hi, the 501 is not an error - this feature is available only for CML product line and is not in CDSW. In fact, there has never been an API to use Experiments in CDSW, which is why there is no documentation here. However, anything that the UI does can be done on the terminal or in scripts if you reverse engineer the API calls. You can do this using for instance the Network tab in the Developer Tools of your browser. If you navigate around to the Experiments page, you will see that it is making the following request: http://cdsw-domain.com/api/altus-ds-1/ds/listruns IF you copy the request as cURL, it will be huge and contain a bunch of random headers, but I was able to basically follow the same steps as described in the Jobs API v1 documentation page: https://docs.cloudera.com/cdsw/1.10.2/jobs-pipelines/topics/cdsw-starting-a-job-run-using-the-api.html By adding your legacy API key to the cURL request, you can get a list of all of the expirements with a cURL like this: curl --user "<Legacy API Key>:" -H "Content-Type: application/json" -X POST <http:// CDSW SERVER>.com/api/altus-ds-1/ds/listruns You can parse through this list for experiments. By following a similar procedure you can probably execute experiments... I'm not sure, I didn't really try that. This method is not supported by Cloudera, and the official response would be to upgrade to CML to use API v2. If you try this and have problems, we can't really help on a support ticket, but you can respond here and I might be able to help. Cheers!

Mike · ‎02-01-2023

I agree with Smarak, the error code typically means that there were not enough resources available to start the job. You could probably use Grafana dashboard (available in the Admin page) to look at the cluster resources and load around the time you had this issue. Is it happening consistently? For Jobs, I usually see this at the start or end of the month, when a lot of people schedule periodic jobs and they all trigger at the same time.

Mike · ‎03-30-2022

When this is happening, are you able to start Sessions as well? Do you have access to the Admin -> Usage page or kubectl access? You should look to see if there are enough resources available for the engine that you have chosen to use when running the job.

Mike · ‎11-02-2021

Sure you are welcome. It is definitely an interesting topic but it's pretty hard to get some actual data, so much depends on the type of workloads you want to run, the size of your nodes, etc. Good luck!

Mike · ‎11-02-2021

Hi, there is not a hard limit on the number of CDSW worker nodes you can have, however there are practical limits based - if you have say thirty nodes, there starts to be a lot more overhead in terms of network traffic and latency. For instance, each worker node will require about 3cpu and 5gb of ram just for the kubelet and internal CDSW pods - so if you have 30 worker nodes, you will be loosing 90cpu and 150gb of ram, which might not pay off. On larger clusters there is a delicate balance between how big your worker nodes are and how many worker nodes you choose to have - I can't really give much guidance on here other than it takes some trial and error to get right. If you have an account with Cloudera you should reach out to that team to get some more detailed information. Some rough guidelines would be to have workers between 32 and 64 vCPU, and have less than 20 of them....but, your mileage may vary. Hope this helps.

Mike · ‎04-01-2021

It looks like you have CDSW configured to use /dev/sdb for your docker block device: ERROR:: Error in pvcreate for [/dev/sdb]: 5 But somehow you also have /dev/sde set: ERROR:: Entries in DOCKER_BLOCK_DEVICES must only be block devices: [/dev/sde]: 1 Check your configuration for DOCKER_BLOCK_DEVICE in CM, it is possible that you have multiple disks set for some reason. (People often do this when they have multiple nodes and the disks are different on each node. You need to use the role group feature to do this instead of listing all of the disks.) A docker block device will be created on the master AND worker, so you should make sure that both machines hav a /dev/sde (or /dev/sdb) that is free and formatted. OR use role groups to set master to be dev/sdb and worker to be /dev/sde. >Also, please let me know if setting up DNS wildcard is adding the *domain name on /etc/hosts or any other DNS file of the linux server No. You need to go into the DNS server itself and add an entry for a wildcard DNS for your domain. It should be *.[your-cdsw-url]. You cannot use /etc/host entries; CDSW ignores /etc/host in containers. I have never set up a DNS server so I can't really help with this.

Online	Offline
Last Visited	‎12-18-2024 11:14 AM

Member Since	‎12-21-2017 12:02 PM
Last Visited	‎12-18-2024 11:14 AM
Posts	141
Kudos received	6

Cloudera Community

Re: Admin enablement issues

Re: Admin enablement issues

Re: Are LLMs built into CML or do we only access t...

Re: How to mask the code being printed in the cons...

Re: Job status "Scheduling" for a manual job

Re: How to enter Credentials for Docker Container ...

Re: How to mask the code being printed in the cons...

Re: How to mask the code being printed in the cons...

Re: CDSW SparkSession Different User

Re: CDSW, Swagger: get experiments - not yet imple...

Re: Job status "Scheduling" for a manual job

Re: Job status Scheduling in Cloudera

Re: CDSW

Re: CDSW

Re: issue creating disk Docker image Block device,...