Community Articles
Find and share helpful community-sourced technical articles.
Cloudera Employee

A recent update to Cloudera Machine Learning brings the ability to create custom code editors with ML runtimes. This article shows the process of creating and adding an ML Runtime to CML that uses is a different editor. First, you will create a Docker image that is configured to use a custom editor, specifically RStudio, and then add it to your workspace.

Step 1: Create and upload the Docker Image

Note: If you just want to use RStudio, you can skip this step and use an image that has already been uploaded:

 

ghcr.io/fletchjeff/cml_rstudio_1.4:2021.09.3

 

You will need to have Docker installed and running to do this step. First clone the repo and use the RStudio 1.4 directory. In a terminal window, run the following commands listed below.

 

$ git clone https://github.com/cloudera/community-ml-runtimes
$ cd rstudio_1.4

 

Now build the Docker image. You need to replace the ghcr.io/fletchjeff/ tag details with the details of the container registry you need to use. The cml_rstudio_1.4:2021.09.3 part of the tag is up to you. cml_rstudio_1.4 is useful to know and 2021.09.3 is a naming convention that we use for community images. The Dockerfile has some useful comments about the structure of the file and can help you customize it for your own requirements.

 

$ docker build -t ghcr.io/fletchjeff/cml_rstudio_1.4:2021.09.3 . -f Dockerfile

 

The next step is to push the image to your container registry.

 

$ docker push ghcr.io/fletchjeff/cml_rstudio_1.4:2021.09.3

 

Assuming the image push worked, you are good for the next step.

Step 2: Add the Runtime image to CML

Note: This step requires that you have the CML Public Cloud - August 31 or a newer version to add a custom runtime. If you don't have the Runtime Catalog navigation item or the Runtime Catalog page doesn't have the Add Runtime button, you might not have the right version or the right permissions. Please check with whoever manages your CDP environment.

 

Navigate to the Runtime Catalog for the CML start page, and click Add Runtime.

fletch_jeff_0-1633432158975.png

In the next step, paste in the link to the image you pushed in Step 1 and click Validate. Your CML instance will need to be able to access this container registry to pull the image. If this is a restricted or air-gapped installation, public container registries might not work and will require a private container registry deployed in an accessible network location. The validation process will confirm the image has the correct labels and can be imported. Click Add to Catalog when you are ready.

fletch_jeff_1-1633432289250.png

Step 3: Use RStudio

Assuming all went well in the last step, you should now be able to use RStudio as an editor when starting a new session. From the New Session page, select RStudio as the editor:

fletch_jeff_2-1633432349726.png

As there should only be one RStudio image available, all the options are displayed, but there is nothing to select apart from enabling Spark for the session.

fletch_jeff_3-1633432390746.png

Once the session launches, you will see a familiar view of RStudio embedded into the CML UI, in the same way JupyterLab is embedded. 

screen.png

While this process is specific to RStudio, it should work for any web-based editor that can be configured to run on a specific port.

211 Views