- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 05-17-2021 08:13 PM
Started working on the Cloudera Public Cloud in Azure and couldn't bare a thought that we would need to pay for all those resources that are not being used in non-prod environments.
My goal was to be able to easily stop/start environment(s) based on the time, so resources (compute and money) can be utilized wisely.
I haven't used Cloudera Manager API because of 2 reasons:
- I couldn't authenticate using the workload username and password. Kept getting empty responses. Only intercepted token from the browser allowed to make a successful request.
- With a multitude of clusters, it is daunting to invoke a multitude of separate APIs.
Unfortunately, cdpcli is still a bit immature and doesn't have a good API in Python yet. I didn't want to use Java, which has what seems to be a slightly better API, for such a trivial task.
I am running Azure Function App with a Timer trigger. I had to use the ElasticPremium tier because I wanted the function to run on the node in the same subnet as my Cloudera nodes.
Code looks like this:
import io
import logging
import sys
import azure.functions as func
from cdpcli import clidriver
from types import SimpleNamespace
def main(mytimer: func.TimerRequest) -> None:
driver = clidriver.CLIDriver()
old_stdout, new_stdout = intercept_stdout()
driver.main(["environments", "list-environments"])
environments_json = new_stdout.getvalue()
environments = parse_environments(environments_json)
if environments:
for environment in environments:
handle_environment(driver, environment)
def parse_environments(environments_json):
import json
if environments_json and environments_json.strip():
environments = list()
parsed_environment = json.loads(environments_json, object_hook=lambda d: SimpleNamespace(**d))
for environment in parsed_environment.environments:
return environments
logging.info("No environment found")
return None
def intercept_stdout():
old_stdout = sys.stdout
new_stdout = io.StringIO()
sys.stdout = new_stdout
return old_stdout, new_stdout
def restore_stdout(std_out):
sys.stdout = std_out
def handle_environment(driver, environment):
if environment:
if environment.status == 'AVAILABLE':
logging.info("about to stop environment %s", environment.environmentName)
driver.main(["environments", "stop-environment", "--environment-name", environment.environmentName])
logging.info("about to start environment %s", environment.environmentName)
driver.main(["environments", "start-environment", "--environment-name", environment.environmentName])
logging.info("Nothing to do")
At the moment it stops all found envs while later on I will probably add the tag-based validation as well.
Another thing to remember is to add the env vars to your function with keys (CDP_ACCESS_KEY_ID, CDP_PRIVATE_KEY) and they will be automatically used by cdpcli to auth.
Hope it helps, and if you have any improvement suggestions, I would be keen to hear.