Community Articles

gkeys · ‎12-14-2018

94601-hcc-automated-model-deployment-personas-framework.png

Problem Statement: Deploying and Governing Models

Machine Learning and Artificial Intelligence are in the process of exploding in importance and prevalence in the enterprise. With this explosive growth comes fundamental challenges in governing model deployments ... and doing this at scale. These challenges revolve around answering the following fundamental questions:

Which models were deployed when? and to where?
Was this deployment to a microservice, a Spark context on Hadoop, or other?
What was the serialized object deployed? How can I find it?
What version was deployed? Who is the owner? What is the larger context around the project
How do I know the details of the model, ie. how do I trace the model in Production to its actual code, its training data, owner, etc?

Previous article: Why and how you should use Atlas to govern your models

Article: Customizing Atlas (Part1): Model governance, traceability and registry

In the previous article I showed how Atlas is a powerful and natural fit for storing and searching model and deployment metadata.

The main features of Atlas model metadata developed in the referenced article are

searchable metadata of deployments of models
searchable metadata of models that were deployed
traceability of deployed models to a model registry that holds concrete model artifacts (code, training data, serialized model used in deployment, project README.md file, etc)
data lineage for deployed models that transform data during data pipelines
no lineage generated for models deployed in a request-response context like microservices which output predictions and have high throughput of data inputs

This article: Generalized Framework to Deploy Models with Apache Atlas for Model Governance

In this article, I present an overarching deployment framework that implements this Atlas governance of models and thus allows stakeholders to answer the above questions as the number of deployed models proliferate. Think prevalence of ML and AI one, two, five years from now.

The Framework

Personas

The personas involved in the model deployment-governance framework are shown below with their actions.

94593-hcc-automated-model-deployment-personas-framework.png

Model owner: stages model artifacts in a defined structure and provides an overview of the model and project in a Read.me file.

Operations: launches automation that deploys the model, copies artifacts from staging to model registry and creates a Model entity in Atlas for this deployment

Multiple stakeholders: (data scientist, data steward, compliance, production issue troubleshooters, etc) use Atlas to answer fundamental questions about deployed models and to access concrete artifacts of those models).

Deployment-Governance Framework

Details of the deployment-governance and person interactions with it are framework are shown below.

94594-hcc-automated-model-deployment-personas-framework.png

Step 1: Model owner stages the model artifacts. This includes:

code and training data
README.md file describing project
metadata.txt with key value pairs (model.name=<value>, model.type=<>, model.version=<>, model.description-<> ....
serialized model for deployment (PMML, MLeap bundle, other)

Step 2: operations deploys the model via an orchestrator automation. This automation:

2a: retrieves model artifacts from staging
2b: deploys serialized model
2c: copies artifacts to model repository
(the automation orchestrator has been aggregating metadata from previous steps)
2d: creates new model entity in Atlas using aggregated metadata

Step 3: use Atlas to understand deployed models

result of deployment is Model entity created in Atlas (see Customizing Atlas (Part1): Model governance, traceability and registry for details)
key capability is Atlas' powerful search techniques against metadata of deployed models, as shown in above diagram
drill-down of model entity in search result provides understanding of deployment and model owner/project and provides traceability to concrete model artifacts in model registry

Deployment-Governance Framework: Simple Implementation

I show below how to implement the deployment framework.

Important point: I have chosen the technologies shown below for a simple demonstration of the framework. Except for Atlas, technology implementations are your choice. For example, you could deploy your model to Spark on Hadoop instead of to a microservice, or you could use PMML instead of MLeap to serialize your model, etc.

Important point summarized: This framework is a template and, except for Atlas, the technologies are swappable.

94599-hcc-automated-model-deployment-implementation-2.png

Setting up your environment

MLeap: follow the instuctions here to set up a dockerized MLeap Runtime http://mleap-docs.combust.ml/mleap-serving/

HDP: Create a HDP cluster sandbox using these instructions

Atlas Model Type: When your HDP cluster is running, create your Atlas model type by running:

#!/bin/bash
ATLAS_UU_PWD=$1
ATLAS_HOST=$2

curl -u ${ATLAS_UU_PWD} -ik -H "Content-Type: application/json" -X POST http://${ATLAS_HOST}:21000/api/atlas/v2/types/typedefs -d '{
  "enumDefs": [],
  "structDefs": [],
  "classificationDefs": [],
  "entityDefs": [
     {
      "superTypes": ["Process"],
      "name": "model",
      "typeVersion": "1.0",
      "attributeDefs": [
         {
         "name": "qualifiedName",
         "typeName": "string",
         "cardinality": "SINGLE",
         "isUnique": false,
         "isOptional": false,
         "isIndexable": true
         },
         {
         "name": "name",
         "typeName": "string",
         "cardinality": "SINGLE",
         "isUnique": false,
         "isOptional": false,
         "isIndexable": true
         },
         {
          "name": "inputs",
          "typeName": "array<DataSet>",
          "isOptional": true,
          "cardinality": "SET",
          "valuesMinCount": 0,
          "valuesMaxCount": 2147483647,
          "isUnique": false,
          "isIndexable": false,
          "includeInNotification": false
         },
         {
          "name": "outputs",
          "typeName": "array<DataSet>",
          "isOptional": true,
          "cardinality": "SET",
          "valuesMinCount": 0,
          "valuesMaxCount": 2147483647,
          "isUnique": false,
          "isIndexable": false,
          "includeInNotification": false
         },
         {
         "name": "deploy.datetime",
         "typeName": "string",
         "cardinality": "SINGLE",
         "isUnique": false,
         "isOptional": false,
         "isIndexable": true
         },
         {
         "name": "deploy.host.type",
         "typeName": "string",
         "cardinality": "SINGLE",
         "isUnique": false,
         "isOptional": false,
         "isIndexable": true
         },
         {
         "name": "deploy.host.detail",
         "typeName": "string",
         "cardinality": "SINGLE",
         "isUnique": false,
         "isOptional": false,
         "isIndexable": true
         },
         {
         "name": "deploy.obj.source",
         "typeName": "string",
         "cardinality": "SINGLE",
         "isUnique": false,
         "isOptional": false,
         "isIndexable": true
         },
         {
         "name": "model.name",
         "typeName": "string",
         "cardinality": "SINGLE",
         "isUnique": false,
         "isOptional": false,
         "isIndexable": true
         },
         {
         "name": "model.version",
         "typeName": "string",
         "cardinality": "SINGLE",
         "isUnique": false,
         "isOptional": false,
         "isIndexable": true
         },
         {
         "name": "model.type",
         "typeName": "string",
         "cardinality": "SINGLE",
         "isUnique": false,
         "isOptional": false,
         "isIndexable": true
         },
         {
         "name": "model.description",
         "typeName": "string",
         "cardinality": "SINGLE",
         "isUnique": false,
         "isOptional": false,
         "isIndexable": true
         },
         {
         "name": "model.owner",
         "typeName": "string",
         "cardinality": "SINGLE",
         "isUnique": false,
         "isOptional": false,
         "isIndexable": true
         },
         {
         "name": "model.owner.lob",
         "typeName": "string",
         "cardinality": "SINGLE",
         "isUnique": false,
         "isOptional": false,
         "isIndexable": true
         },
         {
         "name": "model.registry.url",
         "typeName": "string",
         "cardinality": "SINGLE",
         "isUnique": false,
         "isOptional": false,
         "isIndexable": true
         }
      ]
      }
   ]
}'

See Customizing Atlas (Part1): Model governance, traceability and registry for details

Running the framework

See GitHub repo README.md for details on running: https://github.com/gregkeysquest/ModelDeployment-microservice

Main points are shown below.

Staging (Github)

See repo https://github.com/gregkeysquest/Staging-ModelDeploy-v1.0 for details.

Main points are:

MLeap bundle (serialized model) is in path /executable
the file modelMetadata.txt holds metadata about the model that will be pushed to Atlas model entity -- contents are shown below

model.owner = Greg Keys
model.owner.lob = pricing
model.name = rental pricing prediction
model.type = gradient boosting regression
model.version = 1.1
model.description = model predicts monthly price of rental if property is purchased
model.microservice.endpoint=target

Orchestrator (Groovy calling shell scripts)

The core code for the Groovy orchestrator is shown below

                //STEP 1: retrieve artifacts
  println "[STEP 1: retrieve artifacts] ..... downloading repo to tmp: repo=${repo} \n"
  processBuilder = new ProcessBuilder("shellScripts/fetchRepo.sh",
   repo,
   repoCreds,
   repoRoot).inheritIO().start().waitFor()
     
  //metadata aggregation
  println "[metadata aggregation] ..... gathering model metadata from repo \n "
  ModelMetadata.loadModelMetadata(repo,localRepo)
  
    
  //STEP 2: deploy serialized model
  def modelExecutable=new File("tmp/${repo}/executable").listFiles()[0].getName()
  
  println "[STEP 2: deploy serialized model] ..... deploying model to microservice: modelToDeploy=${modelExecutable} \n "
  processBuilder = new ProcessBuilder("shellScripts/deployModel.sh",
   repo,
   deployHostPort,
   modelExecutable).inheritIO().start().waitFor()
   
     
  //STEP 3: put artifacts to registry
  def modelRegistryPath="hdfs://${hdfsHostName}:8020${hdfsRegistryRoot}/${repo}"
  
  println "[STEP 3: put artifacts to registry] ..... copying tmp to model registry: modelRegistryPath=${modelRegistryPath} \n "
  processBuilder = new ProcessBuilder("shellScripts/pushToRegistry.sh",
   repo,
   modelRegistryPath,
   devMode.toString()).inheritIO().start().waitFor()
   
     
  //metadata aggregation
  println "[metadata aggregation] ..... gathering model deploy metadata \n "
  ModelMetadata.loadDeployMetadata(modelRegistryPath,
   modelExecutable,
   deployHostPort,
   deployHostType)

  
  //STEP 4: create Atlas model entity
  println "[STEP 4: create Atlas model entity] ..... deploying Atlas entity to ${atlasHost} \n "
  processBuilder = new ProcessBuilder("shellScripts/createAtlasModelEntity.sh",
   atlasCreds,
   atlasHost,
   ModelMetadata.deployQualifiedName,
   ModelMetadata.deployName,
   ModelMetadata.deployDateTime,
   ModelMetadata.deployEndPoint,
   ModelMetadata.deployHostType,
   ModelMetadata.modelExecutable,
   ModelMetadata.name,
   ModelMetadata.type,
   ModelMetadata.version,
   ModelMetadata.description,
   ModelMetadata.owner,
   ModelMetadata.ownerLob,
   ModelMetadata.registryURL
   )

Notice

how the steps map directly to the Deployment-Governance Framework diagram above
how metadata is processed and aggregated in two steps: one for model metadata and the other for deployment metadata

Code for processing and aggregating metadata is shown here

class ModelMetadata {
 
 static  metadataFileLocation = "staging/modelMetadata.txt"
 static Properties props = null
 
 static repo = ""
 
 static owner = ""
 static ownerLob = ""
 static name = ""
 static type = ""
 static version = ""
 static description = ""
 static endpoint = ""
 
 static registryURL = ""
 static modelExecutable = ""
 
 static deployEndPoint = ""
 static deployHostType = ""
 static deployDateTime = ""
 static deployName = ""
 static deployQualifiedName = ""
 
 static void loadModelMetadata(repo, localRepo){
  this.repo = repo
  
  props = new Properties()
  
  def input = new FileInputStream(localRepo +"/modelMetadata.txt")
  props.load(input)
  
  this.owner = props.getProperty("model.owner")
  this.ownerLob = props.getProperty("model.owner.lob")
  this.name = props.getProperty("model.name")
  this.type = props.getProperty("model.type")
  this.version = props.getProperty("model.version")
  this.description = props.getProperty("model.description")
  this.endpoint = props.getProperty("model.microservice.endpoint")
 }
 
 static loadDeployMetadata(modelRegistryPath, modelExecutable, deployHostPort, deployHostType) {
  this.deployDateTime = new Date().format('yyyy-MM-dd_HH:mm:ss', TimeZone.getTimeZone('EST'))+"EST"
  this.deployName = "${this.name} v${this.version}"
  this.deployQualifiedName = "${this.deployName}@${deployHostPort}".replace(' ', '-')
  this.registryURL=modelRegistryPath
  this.modelExecutable=modelExecutable
  this.deployEndPoint = "http://${deployHostPort}/${this.endpoint}"
  this.deployHostType = deployHostType
 }


}

Shell Scripts

Each shell script that is called by the orchestrator is shown in the code blocks below

Step 1: fetch staging (maps to 2a in diagram)

#!/bin/bash
# script name: fetchRepo.sh

echo "calling fetchRepo.sh"

REPO=$1
REPO_CRED=$2
REPO_ROOT=$3

# create tmp directory to store stagin
cd tmp

# fetch staging and unzip
curl -u $REPO_CRED -L -o $REPO.zip http://github.com/$REPO_ROOT/$REPO/zipball/master/ 

unzip $REPO.zip

# rename to simplify downstream processing
mv ${REPO_ROOT}* $REPO

# remove zip
rm $REPO.zip

echo "finished fetchRepo.sh"

Step 2: deploy model (maps to 2b in diagram)

#!/bin/bash
# script name: deployModel.sh

echo "starting deployModel.sh"

REPO=$1
HOSTPORT=$2
EXECUTABLE=$3

# copy executable to staing to deploy to target
echo "copying executable to load path with command: cp tmp/${REPO}/executable/* ../loadModel/"
mkdir loadModel
cp tmp/$REPO/executable/* loadModel/

# simplify special string characters
Q="\""
SP="{"
EP="}"

# create json for curl string
JSON_PATH="${SP}${Q}path${Q}:${Q}/models/${EXECUTABLE}${Q}${EP}"

# create host for curl string
URL="http://$HOSTPORT/model"

# run curl string
echo "running command:  curl -XPUT -H \"content-type: application/json\"  -d  ${JSON_PATH} ${URL}"
curl -XPUT -H "content-type: application/json"  -d  $JSON_PATH  $URL

echo "finished deployModel.sh"

Step 3: copy staging to model repository (maps to 2c in diagram)

#!/bin/bash
# script name: pushToRegistry.sh
## Note: for ease of development their is a local mode to write to local file system instead of hdfs

echo "calling pushToRegistry.sh"

REPO_LOCAL=$1
HDFS_TARGET=$2
DEV_MODE=$3

cd tmp
echo "copying localRepository=${REPO_LOCAL} to hdfs modelRegistryPath=${HDFS_TARGET}"

if [ $DEV_MODE ]; then
 MOCK_REGISTRY="../mockedHDFSModelRegistry"
 echo "NOTE: in dev mode .. copying from ${REPO_LOCAL} to ${MOCK_REGISTRY}"
    mkdir $MOCK_REGISTRY
    cp -R $REPO_LOCAL $MOCK_REGISTRY/
else
        sudo hdfs -dfs cp $REPO_LOCAL $HDFS_TARGET
fi

echo "finished pushToRegistry.sh"

Step 4: create Atlas model entity (maps to 2c in diagram)

#!/bin/bash
# script name: createAtlasModelEntity.sh

echo "starting createAtlasModelEntity.sh"

ATLAS_UU_PWD=$1
ATLAS_HOST=$2

echo "running command: curl -u ${ATLAS_UU_PWD} -ik -H \"Content-Type: application/json\" -X POST http://${ATLAS_HOST}:21000/api/atlas/v2/entity/bulk -d (ommitting json)"

curl -u ${ATLAS_UU_PWD} -ik -H "Content-Type: application/json" -X POST http://${ATLAS_HOST}:21000/api/atlas/v2/entity/bulk -d '{
  "entities": [
    {
       "typeName": "model",
       "attributes": {
      "qualifiedName": "'"${3}'"",
        "name": "'"${4}"'",
        "deploy.datetime": "'"${4}"'",
        "deploy.host.type": "'"${5}"'",
        "deploy.host.detail": "'"${6}"'",
        "deploy.obj.source": "'"${7}"'",
        "model.name": "'"${8}"'",
        "model.type": "'"${9}"'",
        "model.version": "1.1",
        "model.description": "'"${10}"'",
        "model.owner": "'"${11}"'",
        "model.owner.lob": "'"${12}"'",
        "model.registry.url": "'"${13}"'"
      }
    }
  ]
}'

echo "finished createAtlasModelEntity.sh"

Summary: What have we accomplished?

We have:

designed a generalized deployment framework for models that integrates and leverages Atlas as a centralized governance tool for these deployments
one key component is the orchestrator which aggregates metadata among process steps and then passes this to Atlas
built upon the implementation and ideas developed in this previous article
presented a simple implementation using technologies shown above

Remember the key point that the deployment framework presented here is generalizable: except for Atlas you can plug in your choice of technologies for the orchestration, staging, model hosting and model repository, including elaborating the framework into a formal software development framework of your choice.

References

Acknowledgements

Appreciation to the Hortonworks Data Science SME groups for their feedback on this idea. Particular appreciation to @Ian B and @Willie Engelbrecht for their deeper attention and interest.

Cloudera Community