Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
New Contributor

Started working on the Cloudera Public Cloud in Azure and couldn't bare a thought that we would need to pay for all those resources that are not being used in non-prod environments.

My goal was to be able to easily stop/start environment(s) based on the time, so resources (compute and money) can be utilized wisely.

I haven't used Cloudera Manager API because of 2 reasons:

  • I couldn't authenticate using the workload username and password. Kept getting empty responses. Only intercepted token from the browser allowed to make a successful request.
  • With a multitude of clusters, it is daunting to invoke a multitude of separate APIs.

Unfortunately, cdpcli is still a bit immature and doesn't have a good API in Python yet. I didn't want to use Java, which has what seems to be a slightly better API, for such a trivial task.

I am running Azure Function App with a Timer trigger. I had to use the ElasticPremium tier because I wanted the function to run on the node in the same subnet as my Cloudera nodes. 

Code looks like this:

 

import io
import logging
import sys

import azure.functions as func
from cdpcli import clidriver
from types import SimpleNamespace


def main(mytimer: func.TimerRequest) -> None:

    driver = clidriver.CLIDriver()
    old_stdout, new_stdout = intercept_stdout()
    driver.main(["environments", "list-environments"])
    environments_json = new_stdout.getvalue()
    restore_stdout(old_stdout)
    environments = parse_environments(environments_json)
    if environments:
        for environment in environments:
            handle_environment(driver, environment)


def parse_environments(environments_json):
    import json
    if environments_json and environments_json.strip():
        environments = list()
        parsed_environment = json.loads(environments_json, object_hook=lambda d: SimpleNamespace(**d))
        for environment in parsed_environment.environments:
            environments.append(environment)
        return environments
    else:
        logging.info("No environment found")
        return None


def intercept_stdout():
    old_stdout = sys.stdout
    new_stdout = io.StringIO()
    sys.stdout = new_stdout
    return old_stdout, new_stdout


def restore_stdout(std_out):
    sys.stdout = std_out


def handle_environment(driver, environment):
    if environment:
        if environment.status == 'AVAILABLE':
            logging.info("about to stop environment %s", environment.environmentName)
            driver.main(["environments", "stop-environment", "--environment-name", environment.environmentName])
        else:
            logging.info("about to start environment %s", environment.environmentName)
            driver.main(["environments", "start-environment", "--environment-name", environment.environmentName])
    else:
        logging.info("Nothing to do")

 

At the moment it stops all found envs while later on I will probably add the tag-based validation as well. 
Another thing to remember is to add the env vars to your function with keys (CDP_ACCESS_KEY_ID, CDP_PRIVATE_KEY) and they will be automatically used by cdpcli to auth.

Hope it helps, and if you have any improvement suggestions, I would be keen to hear.

387 Views
0 Kudos