Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Cloudera Employee

Running custom applications in CDSW/CML

With the recent addition of Analytical Applications, data scientists and data engineers can now deploy their own custom applications and share them with other users.

 

Whilst simple applications may have all necessary code already baked into one script file, more complicated applications may require custom launchers, run flags that are set on execution and parameters that may be instance specific. In order to run these applications, we can leverage the Python subprocess module to run the commands that we would normally have manually entered into the terminal.

Getting Started

For this demonstration, I will show how to run Tensorboard and Hiplot, where both allow for the visualization of parameters from multiple runs of deep learning models.

 

Both applications rely on a custom command to trigger them as standalone applications: tensorboard for Tensorboard and hiplot for Hiplot.

Requirements

  • CDSW 1.7.1 or later versions
  • Python 3 engines with web access available (for installing libraries)

Setup

  1. Click Create Project: (Git clone from github.com- running-custom-applications)
    CreateProject.png
  2. Click Open Workbench:
    OpenWorkbench.png
  3. Launch a new Python Session:
    StartingSession.png
  4. First, ensure that all the required libraries are installed. From the CML/CDSW IDE run:
    bash !pip3 install -r requirements.txt​​
    InstallPackages.png
  5. Packages that are installed in a session will be preserved for use by the project across all sessions.
  6. Here, I have created two-run scripts to start the apps:
    For Hiplot:
    # Hiplot 
    import os 
    import subprocess 
    
    subprocess.call(['hiplot', '--host', '127.0.0.1', '--port', os.environ['CDSW_APP_PORT'] ])​
     I save this out in the run-hiplot.py script. The os.environ["CDSW_APP_PORT] command calls the environment variable CDSW_APP_PORT which specifies which port the application must use in order to run successfully.
    For Tensorboard: 
    # Tensorboard 
    import os 
    import subprocess 
    
    subprocess.call(["tensorboard", " --logdir=logs/fit", "--host", "127.0.0.1", "--port", os.environ["CDSW_APP_PORT"]])​
    I save this out in the run-tensorboard.py script.
  7. Notice that adding a flag is as simple as adding the flag and its settings, as a part of the comma separated list within the subprocess.call([ ... ]) command.

For this demonstration, I will generate some data to populate tensorboard first.

 

  • Run test_runs_tensorflow.py in the CDSW/CML Session by opening the py file, then click the play arrow in the coding window.PopulateWithData.png

Running applications that require flags

Now that we have the script, we can go ahead and trigger it as an application:

  1. Go the applications screen:ApplicationsButton.png
  2. Click New Application:NewApplications.png
  3. Fill in the form:ApplicationForm.png
    Note: The option to Set Environment Variables just before the Create Application button. Leveraging os.environ[''] and the ability to set environment variables from the New Application screen, it is still possible to edit run flags without editing the run script.
  4. Click Create:ApplicationCreated.png
  5. To access the application, click the box with the arrow:ApplicationScreen.png

Conclusion

The new Analytical Applications function rolled out in CDSW 1.7.x and available in CML - Public Cloud enables the deployment of third-party and custom applications on Cloudera Machine Learning infrastructure.

 

Through the use of the Python subprocess module, it is possible to execute arbitrary code and set runtime flags for applications as well.

 

Happy Coding!

2,408 Views