- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 12-14-2018 02:40 PM - edited 09-16-2022 01:44 AM
Problem Statement: Deploying and Governing Models
Machine Learning and Artificial Intelligence are in the process of exploding in importance and prevalence in the enterprise. With this explosive growth comes fundamental challenges in governing model deployments ... and doing this at scale. These challenges revolve around answering the following fundamental questions:
- Which models were deployed when? and to where?
- Was this deployment to a microservice, a Spark context on Hadoop, or other?
- What was the serialized object deployed? How can I find it?
- What version was deployed? Who is the owner? What is the larger context around the project
- How do I know the details of the model, ie. how do I trace the model in Production to its actual code, its training data, owner, etc?
Previous article: Why and how you should use Atlas to govern your models
Article: Customizing Atlas (Part1): Model governance, traceability and registry
In the previous article I showed how Atlas is a powerful and natural fit for storing and searching model and deployment metadata.
The main features of Atlas model metadata developed in the referenced article are
- searchable metadata of deployments of models
- searchable metadata of models that were deployed
- traceability of deployed models to a model registry that holds concrete model artifacts (code, training data, serialized model used in deployment, project README.md file, etc)
- data lineage for deployed models that transform data during data pipelines
- no lineage generated for models deployed in a request-response context like microservices which output predictions and have high throughput of data inputs
This article: Generalized Framework to Deploy Models with Apache Atlas for Model Governance
In this article, I present an overarching deployment framework that implements this Atlas governance of models and thus allows stakeholders to answer the above questions as the number of deployed models proliferate. Think prevalence of ML and AI one, two, five years from now.
The Framework
Personas
The personas involved in the model deployment-governance framework are shown below with their actions.
Model owner: stages model artifacts in a defined structure and provides an overview of the model and project in a Read.me file.
Operations: launches automation that deploys the model, copies artifacts from staging to model registry and creates a Model entity in Atlas for this deployment
Multiple stakeholders: (data scientist, data steward, compliance, production issue troubleshooters, etc) use Atlas to answer fundamental questions about deployed models and to access concrete artifacts of those models).
Deployment-Governance Framework
Details of the deployment-governance and person interactions with it are framework are shown below.
Step 1: Model owner stages the model artifacts. This includes:
- code and training data
- README.md file describing project
- metadata.txt with key value pairs (model.name=<value>, model.type=<>, model.version=<>, model.description-<> ....
- serialized model for deployment (PMML, MLeap bundle, other)
Step 2: operations deploys the model via an orchestrator automation. This automation:
- 2a: retrieves model artifacts from staging
- 2b: deploys serialized model
- 2c: copies artifacts to model repository
- (the automation orchestrator has been aggregating metadata from previous steps)
- 2d: creates new model entity in Atlas using aggregated metadata
Step 3: use Atlas to understand deployed models
- result of deployment is Model entity created in Atlas (see Customizing Atlas (Part1): Model governance, traceability and registry for details)
- key capability is Atlas' powerful search techniques against metadata of deployed models, as shown in above diagram
- drill-down of model entity in search result provides understanding of deployment and model owner/project and provides traceability to concrete model artifacts in model registry
Deployment-Governance Framework: Simple Implementation
I show below how to implement the deployment framework.
Important point: I have chosen the technologies shown below for a simple demonstration of the framework. Except for Atlas, technology implementations are your choice. For example, you could deploy your model to Spark on Hadoop instead of to a microservice, or you could use PMML instead of MLeap to serialize your model, etc.
Important point summarized: This framework is a template and, except for Atlas, the technologies are swappable.
Setting up your environment
MLeap: follow the instuctions here to set up a dockerized MLeap Runtime http://mleap-docs.combust.ml/mleap-serving/
HDP: Create a HDP cluster sandbox using these instructions
Atlas Model Type: When your HDP cluster is running, create your Atlas model type by running:
#!/bin/bash ATLAS_UU_PWD=$1 ATLAS_HOST=$2 curl -u ${ATLAS_UU_PWD} -ik -H "Content-Type: application/json" -X POST http://${ATLAS_HOST}:21000/api/atlas/v2/types/typedefs -d '{ "enumDefs": [], "structDefs": [], "classificationDefs": [], "entityDefs": [ { "superTypes": ["Process"], "name": "model", "typeVersion": "1.0", "attributeDefs": [ { "name": "qualifiedName", "typeName": "string", "cardinality": "SINGLE", "isUnique": false, "isOptional": false, "isIndexable": true }, { "name": "name", "typeName": "string", "cardinality": "SINGLE", "isUnique": false, "isOptional": false, "isIndexable": true }, { "name": "inputs", "typeName": "array<DataSet>", "isOptional": true, "cardinality": "SET", "valuesMinCount": 0, "valuesMaxCount": 2147483647, "isUnique": false, "isIndexable": false, "includeInNotification": false }, { "name": "outputs", "typeName": "array<DataSet>", "isOptional": true, "cardinality": "SET", "valuesMinCount": 0, "valuesMaxCount": 2147483647, "isUnique": false, "isIndexable": false, "includeInNotification": false }, { "name": "deploy.datetime", "typeName": "string", "cardinality": "SINGLE", "isUnique": false, "isOptional": false, "isIndexable": true }, { "name": "deploy.host.type", "typeName": "string", "cardinality": "SINGLE", "isUnique": false, "isOptional": false, "isIndexable": true }, { "name": "deploy.host.detail", "typeName": "string", "cardinality": "SINGLE", "isUnique": false, "isOptional": false, "isIndexable": true }, { "name": "deploy.obj.source", "typeName": "string", "cardinality": "SINGLE", "isUnique": false, "isOptional": false, "isIndexable": true }, { "name": "model.name", "typeName": "string", "cardinality": "SINGLE", "isUnique": false, "isOptional": false, "isIndexable": true }, { "name": "model.version", "typeName": "string", "cardinality": "SINGLE", "isUnique": false, "isOptional": false, "isIndexable": true }, { "name": "model.type", "typeName": "string", "cardinality": "SINGLE", "isUnique": false, "isOptional": false, "isIndexable": true }, { "name": "model.description", "typeName": "string", "cardinality": "SINGLE", "isUnique": false, "isOptional": false, "isIndexable": true }, { "name": "model.owner", "typeName": "string", "cardinality": "SINGLE", "isUnique": false, "isOptional": false, "isIndexable": true }, { "name": "model.owner.lob", "typeName": "string", "cardinality": "SINGLE", "isUnique": false, "isOptional": false, "isIndexable": true }, { "name": "model.registry.url", "typeName": "string", "cardinality": "SINGLE", "isUnique": false, "isOptional": false, "isIndexable": true } ] } ] }'
See Customizing Atlas (Part1): Model governance, traceability and registry for details
Running the framework
See GitHub repo README.md for details on running: https://github.com/gregkeysquest/ModelDeployment-microservice
Main points are shown below.
Staging (Github)
See repo https://github.com/gregkeysquest/Staging-ModelDeploy-v1.0 for details.
Main points are:
- MLeap bundle (serialized model) is in path /executable
- the file modelMetadata.txt holds metadata about the model that will be pushed to Atlas model entity -- contents are shown below
model.owner = Greg Keys model.owner.lob = pricing model.name = rental pricing prediction model.type = gradient boosting regression model.version = 1.1 model.description = model predicts monthly price of rental if property is purchased model.microservice.endpoint=target
Orchestrator (Groovy calling shell scripts)
The core code for the Groovy orchestrator is shown below
//STEP 1: retrieve artifacts println "[STEP 1: retrieve artifacts] ..... downloading repo to tmp: repo=${repo} \n" processBuilder = new ProcessBuilder("shellScripts/fetchRepo.sh", repo, repoCreds, repoRoot).inheritIO().start().waitFor() //metadata aggregation println "[metadata aggregation] ..... gathering model metadata from repo \n " ModelMetadata.loadModelMetadata(repo,localRepo) //STEP 2: deploy serialized model def modelExecutable=new File("tmp/${repo}/executable").listFiles()[0].getName() println "[STEP 2: deploy serialized model] ..... deploying model to microservice: modelToDeploy=${modelExecutable} \n " processBuilder = new ProcessBuilder("shellScripts/deployModel.sh", repo, deployHostPort, modelExecutable).inheritIO().start().waitFor() //STEP 3: put artifacts to registry def modelRegistryPath="hdfs://${hdfsHostName}:8020${hdfsRegistryRoot}/${repo}" println "[STEP 3: put artifacts to registry] ..... copying tmp to model registry: modelRegistryPath=${modelRegistryPath} \n " processBuilder = new ProcessBuilder("shellScripts/pushToRegistry.sh", repo, modelRegistryPath, devMode.toString()).inheritIO().start().waitFor() //metadata aggregation println "[metadata aggregation] ..... gathering model deploy metadata \n " ModelMetadata.loadDeployMetadata(modelRegistryPath, modelExecutable, deployHostPort, deployHostType) //STEP 4: create Atlas model entity println "[STEP 4: create Atlas model entity] ..... deploying Atlas entity to ${atlasHost} \n " processBuilder = new ProcessBuilder("shellScripts/createAtlasModelEntity.sh", atlasCreds, atlasHost, ModelMetadata.deployQualifiedName, ModelMetadata.deployName, ModelMetadata.deployDateTime, ModelMetadata.deployEndPoint, ModelMetadata.deployHostType, ModelMetadata.modelExecutable, ModelMetadata.name, ModelMetadata.type, ModelMetadata.version, ModelMetadata.description, ModelMetadata.owner, ModelMetadata.ownerLob, ModelMetadata.registryURL )
Notice
- how the steps map directly to the Deployment-Governance Framework diagram above
- how metadata is processed and aggregated in two steps: one for model metadata and the other for deployment metadata
Code for processing and aggregating metadata is shown here
class ModelMetadata { static metadataFileLocation = "staging/modelMetadata.txt" static Properties props = null static repo = "" static owner = "" static ownerLob = "" static name = "" static type = "" static version = "" static description = "" static endpoint = "" static registryURL = "" static modelExecutable = "" static deployEndPoint = "" static deployHostType = "" static deployDateTime = "" static deployName = "" static deployQualifiedName = "" static void loadModelMetadata(repo, localRepo){ this.repo = repo props = new Properties() def input = new FileInputStream(localRepo +"/modelMetadata.txt") props.load(input) this.owner = props.getProperty("model.owner") this.ownerLob = props.getProperty("model.owner.lob") this.name = props.getProperty("model.name") this.type = props.getProperty("model.type") this.version = props.getProperty("model.version") this.description = props.getProperty("model.description") this.endpoint = props.getProperty("model.microservice.endpoint") } static loadDeployMetadata(modelRegistryPath, modelExecutable, deployHostPort, deployHostType) { this.deployDateTime = new Date().format('yyyy-MM-dd_HH:mm:ss', TimeZone.getTimeZone('EST'))+"EST" this.deployName = "${this.name} v${this.version}" this.deployQualifiedName = "${this.deployName}@${deployHostPort}".replace(' ', '-') this.registryURL=modelRegistryPath this.modelExecutable=modelExecutable this.deployEndPoint = "http://${deployHostPort}/${this.endpoint}" this.deployHostType = deployHostType } }
Shell Scripts
Each shell script that is called by the orchestrator is shown in the code blocks below
Step 1: fetch staging (maps to 2a in diagram)
#!/bin/bash # script name: fetchRepo.sh echo "calling fetchRepo.sh" REPO=$1 REPO_CRED=$2 REPO_ROOT=$3 # create tmp directory to store stagin cd tmp # fetch staging and unzip curl -u $REPO_CRED -L -o $REPO.zip http://github.com/$REPO_ROOT/$REPO/zipball/master/ unzip $REPO.zip # rename to simplify downstream processing mv ${REPO_ROOT}* $REPO # remove zip rm $REPO.zip echo "finished fetchRepo.sh"
Step 2: deploy model (maps to 2b in diagram)
#!/bin/bash # script name: deployModel.sh echo "starting deployModel.sh" REPO=$1 HOSTPORT=$2 EXECUTABLE=$3 # copy executable to staing to deploy to target echo "copying executable to load path with command: cp tmp/${REPO}/executable/* ../loadModel/" mkdir loadModel cp tmp/$REPO/executable/* loadModel/ # simplify special string characters Q="\"" SP="{" EP="}" # create json for curl string JSON_PATH="${SP}${Q}path${Q}:${Q}/models/${EXECUTABLE}${Q}${EP}" # create host for curl string URL="http://$HOSTPORT/model" # run curl string echo "running command: curl -XPUT -H \"content-type: application/json\" -d ${JSON_PATH} ${URL}" curl -XPUT -H "content-type: application/json" -d $JSON_PATH $URL echo "finished deployModel.sh"
Step 3: copy staging to model repository (maps to 2c in diagram)
#!/bin/bash # script name: pushToRegistry.sh ## Note: for ease of development their is a local mode to write to local file system instead of hdfs echo "calling pushToRegistry.sh" REPO_LOCAL=$1 HDFS_TARGET=$2 DEV_MODE=$3 cd tmp echo "copying localRepository=${REPO_LOCAL} to hdfs modelRegistryPath=${HDFS_TARGET}" if [ $DEV_MODE ]; then MOCK_REGISTRY="../mockedHDFSModelRegistry" echo "NOTE: in dev mode .. copying from ${REPO_LOCAL} to ${MOCK_REGISTRY}" mkdir $MOCK_REGISTRY cp -R $REPO_LOCAL $MOCK_REGISTRY/ else sudo hdfs -dfs cp $REPO_LOCAL $HDFS_TARGET fi echo "finished pushToRegistry.sh"
Step 4: create Atlas model entity (maps to 2c in diagram)
#!/bin/bash # script name: createAtlasModelEntity.sh echo "starting createAtlasModelEntity.sh" ATLAS_UU_PWD=$1 ATLAS_HOST=$2 echo "running command: curl -u ${ATLAS_UU_PWD} -ik -H \"Content-Type: application/json\" -X POST http://${ATLAS_HOST}:21000/api/atlas/v2/entity/bulk -d (ommitting json)" curl -u ${ATLAS_UU_PWD} -ik -H "Content-Type: application/json" -X POST http://${ATLAS_HOST}:21000/api/atlas/v2/entity/bulk -d '{ "entities": [ { "typeName": "model", "attributes": { "qualifiedName": "'"${3}'"", "name": "'"${4}"'", "deploy.datetime": "'"${4}"'", "deploy.host.type": "'"${5}"'", "deploy.host.detail": "'"${6}"'", "deploy.obj.source": "'"${7}"'", "model.name": "'"${8}"'", "model.type": "'"${9}"'", "model.version": "1.1", "model.description": "'"${10}"'", "model.owner": "'"${11}"'", "model.owner.lob": "'"${12}"'", "model.registry.url": "'"${13}"'" } } ] }' echo "finished createAtlasModelEntity.sh"
Summary: What have we accomplished?
We have:
- designed a generalized deployment framework for models that integrates and leverages Atlas as a centralized governance tool for these deployments
- one key component is the orchestrator which aggregates metadata among process steps and then passes this to Atlas
- built upon the implementation and ideas developed in this previous article
- presented a simple implementation using technologies shown above
Remember the key point that the deployment framework presented here is generalizable: except for Atlas you can plug in your choice of technologies for the orchestration, staging, model hosting and model repository, including elaborating the framework into a formal software development framework of your choice.
References
- Customizing Atlas (Part1): Model governance, traceability and registry
- Atlas brief
- Atlas deep
- Groovy
- GitHub
- MLeap
Acknowledgements
Appreciation to the Hortonworks Data Science SME groups for their feedback on this idea. Particular appreciation to @Ian B and @Willie Engelbrecht for their deeper attention and interest.