Member since
12-21-2017
149
Posts
6
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
646 | 06-10-2025 04:35 PM | |
1950 | 09-18-2024 11:52 AM | |
2070 | 09-12-2024 04:54 PM | |
2015 | 07-30-2024 11:49 AM | |
1890 | 02-06-2023 09:56 AM |
06-10-2025
10:52 PM
Dear Mike, Thank you for your support. I would like to highlight that I do not have internet access within CDSW, as our Cloudera environment is fully air-gapped. From your previous message, I understand that you are suggesting I create a custom Docker image and run it accordingly. I have already created a custom Docker image — however, I would appreciate it if you could review it and provide your feedback. Additionally, could you clarify your expectations regarding the base operating system for the container image? and the Base image? For your reference, we are using CDSW version 1.10.5. Best wishes, Salim ### **Description of the Docker Image** This Docker image is a **customized environment** tailored for Cloudera Data Science Workbench (CDSW) in an **air-gapped (offline)** setting. It includes the following components: 1. **Base Image**: - Starts with `docker.repository.cloudera.com/cdsw/engine:8` , ensuring compatibility with CDSW. 2. **Operating System**: - **Ubuntu 20.04 LTS**, a lightweight and stable Linux distribution. 3. **MySQL 8.0.4**: - A milestone release of MySQL, installed from pre-downloaded `.deb` packages (no internet required). 4. **Python 3.8.18**: - Compiled from source to ensure version compatibility. 5. **Node.js 22.16.0**: - Installed via a pre-extracted binary archive. 6. **Grafana Enterprise 11.6.0**: - Installed using a `.deb` package for enterprise-grade monitoring. 7. **Ollama**: - A pre-downloaded binary for running large language models locally. 8. **Python Packages**: - Installed **offline** from a pre-downloaded `requirements.txt` file and local wheels (`.whl` or `.tar.gz`) . 9. **Exposed Ports**: - MySQL (3306), Python apps (8000), Grafana (3000), and Ollama (11434). --- ### **Requirements to Build the Image** To build this image in an **air-gapped environment**, you must pre-download and include the following: 1. **Pre-downloaded Dependencies**: - **MySQL 8.0.4 `.deb` packages** (from Cloudera or MySQL archives). - **Python 3.8.18 source tarball** (from [python.org](https://www.python.org/ftp/python/3.8.18/)). - **Node.js 22.16.0 Linux x64 binary** (from [nodejs.org](https://nodejs.org/dist/v22.16.0/)). - **Grafana Enterprise 11.6.0 `.deb` package** (from Grafana’s enterprise download page). - **Ollama binary** (from [ollama.ai/download](https://ollama.ai/download)). 2. **Offline Python Packages**: - Use `pip download -r requirements.txt -d python_packages/` in an online environment to fetch all dependencies locally . 3. **Directory Structure**: Ensure the following files/directories exist in the build context: ```bash . ├── Dockerfile ├── requirements.txt ├── ollama # Pre-downloaded Ollama binary └── dependencies/ ├── mysql-8.0.4/ # MySQL .deb packages ├── python/ # Python 3.8.18 source ├── node-v22.16.0-linux-x64.tar.xz ├── grafana-enterprise-11.6.0.deb └── python_packages/ # Pre-downloaded Python wheels ``` 4. **Build Command**: Run: ```bash docker build -t custom-cdsw:latest . ``` 5. **Push to Private Registry (Optional)**: For CDSW integration, push the image to a registry accessible by CDSW: ```bash docker tag custom-cdsw:latest <your-registry>/custom-cdsw:latest docker push <your-registry>/custom-cdsw:latest ``` --- ### **Key References** - Docker images contain application code, libraries, tools, and dependencies . - Use `docker inspect` to view details about the image . - Dockerfiles are often shared in repositories for transparency . Let me know if you need further clarification!
... View more
10-16-2024
06:33 AM
FYI you can use this site to set the password to something else: https://www.browserling.com/tools/bcrypt
... View more
08-01-2024
02:19 AM
1 Kudo
Thanks Mike - I thought this was the case.
... View more
04-25-2024
10:00 PM
2 Kudos
@akshay0103 Could you please check with your account team of a new docker credentials or getting the existing one activated or create an administrative case.
... View more
11-02-2023
08:37 AM
Hmm, that makes sense I suppose. That was my next guess 😉 Glad you figured it out, and thanks for posting the answer here, I appreciate it!
... View more
10-20-2023
11:53 AM
Hi Krishna We have done this by pushing commands out to the shell after setting up a trusted SSH connection between CDSW and the Unix server. This is the python function we use: user_name = "username"
unix_server = "my.unix.host"
unix_path = "/some/path"
file_to_transfer = "my_csv_file.csv"
def scp_file_to_sas(local_path, file_name, user_name, unix_server, unix_path):
p = subprocess.Popen(
[
"scp",
"-v",
local_path + file_name,
user_name + "@" + unix_server + ":/" + unix_path + "/" + file_name,
]
)
sts = os.waitpid(p.pid, 0)
... View more
06-05-2023
06:32 AM
@asandovala21 Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
02-06-2023
09:56 AM
1 Kudo
I think you will have to write the output you want to share into an attachment and simply share the attachment only. When you set up a job you can tell it not to send the console output.
... View more
02-01-2023
11:30 AM
Hi, whichever user you log in as under the User Settings -> Hadoop Authentication should be what CDSW uses when accessing the rest of the cluster. Typically your CDSW username/password is your SSO login, but you can change this Hadoop Authentication settings if you want. Let me know if that works for you!
... View more
02-01-2023
11:13 AM
Hi, the 501 is not an error - this feature is available only for CML product line and is not in CDSW. In fact, there has never been an API to use Experiments in CDSW, which is why there is no documentation here. However, anything that the UI does can be done on the terminal or in scripts if you reverse engineer the API calls. You can do this using for instance the Network tab in the Developer Tools of your browser. If you navigate around to the Experiments page, you will see that it is making the following request: http://cdsw-domain.com/api/altus-ds-1/ds/listruns IF you copy the request as cURL, it will be huge and contain a bunch of random headers, but I was able to basically follow the same steps as described in the Jobs API v1 documentation page: https://docs.cloudera.com/cdsw/1.10.2/jobs-pipelines/topics/cdsw-starting-a-job-run-using-the-api.html By adding your legacy API key to the cURL request, you can get a list of all of the expirements with a cURL like this: curl --user "<Legacy API Key>:" -H "Content-Type: application/json" -X POST <http:// CDSW SERVER>.com/api/altus-ds-1/ds/listruns You can parse through this list for experiments. By following a similar procedure you can probably execute experiments... I'm not sure, I didn't really try that. This method is not supported by Cloudera, and the official response would be to upgrade to CML to use API v2. If you try this and have problems, we can't really help on a support ticket, but you can respond here and I might be able to help. Cheers!
... View more