Member since
12-21-2017
150
Posts
6
Kudos Received
7
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1839 | 06-10-2025 04:35 PM | |
| 3213 | 09-18-2024 11:52 AM | |
| 3333 | 09-12-2024 04:54 PM | |
| 2747 | 07-30-2024 11:49 AM | |
| 2537 | 02-06-2023 09:56 AM |
06-01-2023
02:39 PM
1 Kudo
Hi, the documentation suggests to use SparklyR to read from Impala in R: https://docs.cloudera.com/cdsw/1.10.3/import-data/topics/cdsw-running-queries-on-impala-tables.html#pnavId4 I think you could also use RJDBC library to set up a JDBC connection to impala: https://cran.r-project.org/web/packages/RJDBC/index.html
... View more
05-30-2023
06:39 AM
Hi, do you have access to the actual CDSW host environment on the CDSW master host? If so, you will be ablet o use sftp - the project files will be located in /var/lib/cdsw/current/projects/projects/[number]/[number] where the first [number] is usually 0 and the next number will be the id of your project. You can find what this ID is by running something like: kubectl exec -it $(kubectl get pods -l role=db -o jsonpath='{.items[*].metadata.name}') -- psql -P pager=off --expanded -U sense -c "select id from projects where name='my_test_project'" Another way you could transfer files from the CDSW project would be to simply open the project, start a session, open the _Terminal, and run: python -m http.server 8080 --bind 127.0.0.1 This will start the SimpleHTTPServer. Let this run, and go back to the project. Where the little 9 dot menu is, there will be a new option to view the directory listing: You can just click this, or manually go to the generated URL (https://public-9hcxj43aavpxk7gt.cdsw.company.com/ in your browser (where the public- is required, and the next chunk is the user session) or use wget or something to download files. There are probably a lot of other ways to pull files directly from CDSW, but these two methods should get you started on whatever you are trying to do.
... View more
03-20-2023
11:55 AM
Hi, the docker repository is available publicly. You can test by running: docker pull docker.repository.cloudera.com/cdsw/engine:13 does that work? Can you paste the actual authentication error you are seeing? Is it an x509 error? Can you paste the contents of /etc/docker/daemon.json?
... View more
02-06-2023
09:56 AM
1 Kudo
I think you will have to write the output you want to share into an attachment and simply share the attachment only. When you set up a job you can tell it not to send the console output.
... View more
02-06-2023
09:27 AM
Hi. When you go to share a session, there is a box to "Hide code and text" - does this work for you? I don't really see a way to do this inside of a scheduled job. I think your best bet is to write whatever important data that you want to share to a file, and then when you set up your job, uncheck the button to share the console and include the attachment. Can you try that?
... View more
02-01-2023
09:19 AM
I agree with Smarak, the error code typically means that there were not enough resources available to start the job. You could probably use Grafana dashboard (available in the Admin page) to look at the cluster resources and load around the time you had this issue. Is it happening consistently? For Jobs, I usually see this at the start or end of the month, when a lot of people schedule periodic jobs and they all trigger at the same time.
... View more
03-30-2022
03:33 PM
When this is happening, are you able to start Sessions as well? Do you have access to the Admin -> Usage page or kubectl access? You should look to see if there are enough resources available for the engine that you have chosen to use when running the job.
... View more
11-02-2021
08:53 AM
Sure you are welcome. It is definitely an interesting topic but it's pretty hard to get some actual data, so much depends on the type of workloads you want to run, the size of your nodes, etc. Good luck!
... View more
11-02-2021
08:28 AM
Hi, there is not a hard limit on the number of CDSW worker nodes you can have, however there are practical limits based - if you have say thirty nodes, there starts to be a lot more overhead in terms of network traffic and latency. For instance, each worker node will require about 3cpu and 5gb of ram just for the kubelet and internal CDSW pods - so if you have 30 worker nodes, you will be loosing 90cpu and 150gb of ram, which might not pay off. On larger clusters there is a delicate balance between how big your worker nodes are and how many worker nodes you choose to have - I can't really give much guidance on here other than it takes some trial and error to get right. If you have an account with Cloudera you should reach out to that team to get some more detailed information. Some rough guidelines would be to have workers between 32 and 64 vCPU, and have less than 20 of them....but, your mileage may vary. Hope this helps.
... View more
03-16-2021
06:27 AM
This is a pretty interesting question, at first I was going to suggest just using a copy command in your Dockerfile to copy this file over, however, I'm not totally positive that the .jupyter directory exists until you start up a CDSW session with .jupyter notebook. Can you show me what you are adding in the config file to time out the Jupyter notebooks? You are correct that Jupyter notebooks do not time out from the IDLE_MAXIMUM_MINUTES environment variable. R Studio sessions do not either and this has been a long running and difficult issue since Cloudera doesn't write or control this code. It looks like a lot of this is fixed in CDSW 1.9 though. If you just want to time out. jupyter notebooks, you could try to edit the Jupyter Notebook command and add this: NOTEBOOK_TIMEOUT_SECONDS=$(python3 -c "print(${IDLE_MAXIMUM_MINUTES}*60)") /usr/local/bin/jupyter notebook --no-browser --ip=127.0.0.1 --port=${CDSW_APP_PORT} --NotebookApp.token= --NotebookApp.allow_remote_access=True --NotebookApp.quit_button=False --log-level=ERROR --NotebookApp.shutdown_no_activity_timeout=300 --MappingKernelManager.cull_idle_timeout=${NOTEBOOK_TIMEOUT_SECONDS} -- TerminalManager.cull_inactive_timeout=${NOTEBOOK_TIMEOUT_SECONDS} --MappingKernelManager.cull_interval=60 --TerminalManager.cull_interval=60 --MappingKernelManager.cull_connected=True This will kill Jupyter Notebooks that have been longer than IDLE_MAXIMUM_MINUTES of inactivity (default to 60 minutes.) There are a few caveats to this, the main one being that this still wont kill Jupyter Terminals due to the version of Jupyter Notebooks that CDSW 1.7 uses. Also, users will not get a warning; their Notebook and corresponding CDSW session will just get killed. You can try this and let me know if it works. I'm also curious about your config file that you want to add into .jupyter.
... View more
- « Previous
-
- 1
- 2
- Next »