Member since
12-21-2017
125
Posts
2
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
331 | 02-06-2023 09:56 AM | |
1301 | 02-01-2023 09:19 AM | |
589 | 11-02-2021 08:28 AM |
06-01-2023
02:39 PM
1 Kudo
Hi, the documentation suggests to use SparklyR to read from Impala in R: https://docs.cloudera.com/cdsw/1.10.3/import-data/topics/cdsw-running-queries-on-impala-tables.html#pnavId4 I think you could also use RJDBC library to set up a JDBC connection to impala: https://cran.r-project.org/web/packages/RJDBC/index.html
... View more
05-30-2023
06:39 AM
Hi, do you have access to the actual CDSW host environment on the CDSW master host? If so, you will be ablet o use sftp - the project files will be located in /var/lib/cdsw/current/projects/projects/[number]/[number] where the first [number] is usually 0 and the next number will be the id of your project. You can find what this ID is by running something like: kubectl exec -it $(kubectl get pods -l role=db -o jsonpath='{.items[*].metadata.name}') -- psql -P pager=off --expanded -U sense -c "select id from projects where name='my_test_project'" Another way you could transfer files from the CDSW project would be to simply open the project, start a session, open the _Terminal, and run: python -m http.server 8080 --bind 127.0.0.1 This will start the SimpleHTTPServer. Let this run, and go back to the project. Where the little 9 dot menu is, there will be a new option to view the directory listing: You can just click this, or manually go to the generated URL (https://public-9hcxj43aavpxk7gt.cdsw.company.com/ in your browser (where the public- is required, and the next chunk is the user session) or use wget or something to download files. There are probably a lot of other ways to pull files directly from CDSW, but these two methods should get you started on whatever you are trying to do.
... View more
03-20-2023
11:55 AM
Hi, the docker repository is available publicly. You can test by running: docker pull docker.repository.cloudera.com/cdsw/engine:13 does that work? Can you paste the actual authentication error you are seeing? Is it an x509 error? Can you paste the contents of /etc/docker/daemon.json?
... View more
02-06-2023
09:56 AM
1 Kudo
I think you will have to write the output you want to share into an attachment and simply share the attachment only. When you set up a job you can tell it not to send the console output.
... View more
02-06-2023
09:27 AM
Hi. When you go to share a session, there is a box to "Hide code and text" - does this work for you? I don't really see a way to do this inside of a scheduled job. I think your best bet is to write whatever important data that you want to share to a file, and then when you set up your job, uncheck the button to share the console and include the attachment. Can you try that?
... View more
02-01-2023
11:30 AM
Hi, whichever user you log in as under the User Settings -> Hadoop Authentication should be what CDSW uses when accessing the rest of the cluster. Typically your CDSW username/password is your SSO login, but you can change this Hadoop Authentication settings if you want. Let me know if that works for you!
... View more
02-01-2023
11:13 AM
Hi, the 501 is not an error - this feature is available only for CML product line and is not in CDSW. In fact, there has never been an API to use Experiments in CDSW, which is why there is no documentation here. However, anything that the UI does can be done on the terminal or in scripts if you reverse engineer the API calls. You can do this using for instance the Network tab in the Developer Tools of your browser. If you navigate around to the Experiments page, you will see that it is making the following request: http://cdsw-domain.com/api/altus-ds-1/ds/listruns IF you copy the request as cURL, it will be huge and contain a bunch of random headers, but I was able to basically follow the same steps as described in the Jobs API v1 documentation page: https://docs.cloudera.com/cdsw/1.10.2/jobs-pipelines/topics/cdsw-starting-a-job-run-using-the-api.html By adding your legacy API key to the cURL request, you can get a list of all of the expirements with a cURL like this: curl --user "<Legacy API Key>:" -H "Content-Type: application/json" -X POST <http:// CDSW SERVER>.com/api/altus-ds-1/ds/listruns You can parse through this list for experiments. By following a similar procedure you can probably execute experiments... I'm not sure, I didn't really try that. This method is not supported by Cloudera, and the official response would be to upgrade to CML to use API v2. If you try this and have problems, we can't really help on a support ticket, but you can respond here and I might be able to help. Cheers!
... View more
02-01-2023
09:19 AM
I agree with Smarak, the error code typically means that there were not enough resources available to start the job. You could probably use Grafana dashboard (available in the Admin page) to look at the cluster resources and load around the time you had this issue. Is it happening consistently? For Jobs, I usually see this at the start or end of the month, when a lot of people schedule periodic jobs and they all trigger at the same time.
... View more
03-30-2022
03:33 PM
When this is happening, are you able to start Sessions as well? Do you have access to the Admin -> Usage page or kubectl access? You should look to see if there are enough resources available for the engine that you have chosen to use when running the job.
... View more
11-02-2021
08:53 AM
Sure you are welcome. It is definitely an interesting topic but it's pretty hard to get some actual data, so much depends on the type of workloads you want to run, the size of your nodes, etc. Good luck!
... View more