Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Contributor

Introduction

This post covers the steps required to build a custom runtime for Cloudera Data Engineering (CDE). The process pulls the base image from container.repository.cloudera.com and builds a custom image based on the Dockerfile provided and uploads the custom image to Amazon ECR using AWS CodeBuild. All the files mentioned in this post can be downloaded from here.

 

cde-runtime-with-codebuild-ecr.png 

Steps

Setup ECR & IAM role using AWS CloudFormation

  1. The CloudFormation template cloudformation-ecr-codebuild.yml creates the Amazon Elastic Container Repository and the IAM role required for AWS CodeBuild
  2. Update the files cloudformation-ecr-codebuild.yml, cloudformation-parameters.json, and cloudformation-tags.json as required
  3. Create the Cloudformation stack using the following command
    aws cloudformation create-stack \ 
    --stack-name vkar-ecr \
    --template-body file://cloudformation-ecr-codebuild.yml \
    --parameters file://cloudformation-parameters.json \
    --tags file://cloudformation-tags.json \
    --capabilities CAPABILITY_NAMED_IAM
  4. (If required) To update the stack using change sets, use the following commands.
    aws cloudformation create-change-set \
    --stack-name vkar-ecr \
    --change-set-name change1 \
    --template-body file://cloudformation-ecr-codebuild.yml \
    --parameters file://cloudformation-parameters.json \
    --tags file://cloudformation-tags.json \
    --capabilities CAPABILITY_NAMED_IAM

Modify AWS CodeBuild config files

  • Update the Dockerfile with the required customization required to the base image. The base image for the docker file will be provided by the aws-codebuild.json file.
  • Modify the aws-codebuild.json file with the updates for the environment.
  • The parameters SOURCE_REPO_USERNAME and SOURCE_REPO_PASSWORD in the aws-codebuild.json file specify the AWS Secrets Manager secret. You can embed the username and password here (or) specify the secret name here.
  • Quickly review the buildspec.yml file and make modifications if necessary. In general, no changes will be needed here.

Build the custom image using AWS CodeBuild

  • Zip up the Dockerfile & buildspec.yml (with no root directory) and upload them to the s3 bucket specified in aws-codebuild.json.
  • Create the CodeBuild project using the following command:
    aws codebuild create-project --cli-input-json file://aws-codebuild.json
  • Run the build using the following command:
    aws codebuild start-build --project-name cde-ml-xgboost-build
  • You can customize aws-codebuild.sh to automate the above steps.

Screenshots

code-build.pngecr.png

Run CDE job with custom runtime image

Follow these steps to use the custom runtime image to run a job:

  1. Create a resource of type custom-runtime-image:
    cde resource create --type="custom-runtime-image" \
    --image-engine="spark2" \
    --name="cde-runtime-ml" \
    --image="123456789012.dkr.ecr.us-west-2.amazonaws.com/cde/cde-spark-runtime-2.4.5:ml-xgboost"
  2. Create a job using the newly created resource:
    cde job create --type spark --name ml-scoring-job \
    --runtime-image-resource-name cde-runtime-ml \
    --application-file ./ml-scoring.py \
    --num-executors 30 \
    --executor-memory 4G \
    --driver-memory 4G
  3. Execute the job:
    cde job run --name ml-scoring-job

 

-------------------

Vijay Anand Karthikeyan

2,367 Views