Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)
avatar
Rising Star

Update:

See here for a Docker on YARN sandbox solution:

https://community.hortonworks.com/articles/232540/docker-on-yarn-sandbox.html

Overview

This guide has been tested with and without Kerberos on HDP 3.0.1.

YARN offers a DNS service backed by Zookeeper for service discovery, but that can be challenging to setup. For a quickstart scenario, I will use docker swarm and an overlay network instead. If your environment is a single host, the networking is even simpler. This configuration is not recommended for production.

I will use pssh to run commands in parallel across the cluster based on a hostlist file and a workerlist file. The hostlist file should contain every host in the cluster, and the workerlist file should include every node except for the one chosen to be the docker swarm master node.

Prerequisites

Install HDP 3.0.1 with or without Kerberos

Install Docker on every host in the cluster

#pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem -o 'StrictHostKeyChecking no'" "echo hostname"
pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo yum install -y yum-utils device-mapper-persistent-data lvm2"
pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo";
pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo yum install -y docker-ce"
pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo systemctl start docker"
pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo systemctl enable docker"


Configure docker swarm and create an overlay network

ssh -i ~/cloudbreak.pem cloudbreak@<masternode> "sudo docker swarm init"
pssh -i -h workerlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo <output from last command: docker swarm join ...>"
ssh -i ~/cloudbreak.pem cloudbreak@<masternode> "sudo docker network create -d overlay --attachable yarnnetwork"

If Kerberos is not enabled, create a default user for containers:

pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo useradd dockeruser"

Ambari

In the YARN general settings tab, toggle the Docker Runtime button to "Enabled".

This should change the following setting in Advanced YARN-Site:

yarn.nodemanager.container-executor.class=org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor

In Advanced YARN-Site, change the following, so all YARN docker containers use the overlay network we created by default:

yarn.nodemanager.runtime.linux.docker.default-container-network=yarnnetwork
yarn.nodemanager.runtime.linux.docker.allowed-container-networks=host,none,bridge,yarnnetwork

In Custom YARN-Site, add the following if kerberos is not enabled:

yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user=dockeruser

In Advanced Container Executor:

Note that this allows any image from docker hub to be run. to limit the that docker images that can be run, set this property to a comma separated list of trusted registries. Docker images have the form <registry>/<imageName>:<tag>.

docker_trusted_registries=*

Alternatively, the following Ambari blueprint encapsulates these configurations:

{
    "configurations" : [
    {
      "yarn-site" : {
        "properties" : {
          "yarn.nodemanager.container-executor.class" : "org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor",
          "yarn.nodemanager.runtime.linux.docker.default-container-network" : "yarnnetwork",
          "yarn.nodemanager.runtime.linux.docker.allowed-container-networks" : "host,none,bridge,yarnnetwork",
          "yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user" : "dockeruser"
        }
      }
    },
    {
      "container-executor" : {
        "properties" : {
          "docker_trusted_registries" : "library",
          "docker_module_enabled" : "true"
        }
      }
    }],
    "host_groups" : [
        {
            "name" : "all",
            "components" : [
                {"name" : "HISTORYSERVER"},
                {"name" : "NAMENODE"},
                {"name" : "APP_TIMELINE_SERVER"},
                {"name" : "NODEMANAGER"},
                {"name" : "DATANODE"},
                {"name" : "RESOURCEMANAGER"},
                {"name" : "ZOOKEEPER_SERVER"},
                {"name" : "SECONDARY_NAMENODE"},

                {"name" : "HDFS_CLIENT"},
                {"name" : "ZOOKEEPER_CLIENT"},
                {"name" : "YARN_CLIENT"},
                {"name" : "MAPREDUCE2_CLIENT"}
            ],
            "cardinality" : "1"
        }
    ],
    "Blueprints" : {
        "blueprint_name" : "yarn sample",
        "stack_name" : "HDP",
        "stack_version" : "3.0"
    }
}


Save the configurations and restart YARN


Usage

The YARN service REST API documentation can be found here:

https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/yarn-service/YarnServiceAPI.html

The YARN app CLI documentation can be found here:

https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#application_or_...



Testing without Kerberos

Place the following service definition into a file (e.g. yarnservice.json)

{
  "name": "redis-service",
  "version": "1.0.0",
  "description": "redis example",
  "components" :
    [
      {
        "name": "redis",
        "number_of_containers": 1,
        "artifact": {
          "id": "library/redis",
          "type": "DOCKER"
        },
        "launch_command": "",
        "resource": {
          "cpus": 1,
          "memory": "256"
        },
        "configuration": {
          "env": {
            "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE": "true"
          }
        }
      }
    ]
}

Submit the service with the following curl command. YARN should respond back with the applicationId

The user will need write permission on their HDFS home directory(e.g. hdfs:/user/user1). ambari-qa has it by default.

curl -X POST -H "Content-Type: application/json" http://<resource manager>:8088/app/v1/services?user.name=ambari-qa -d @yarnservice.json

The service status can be viewed on the YARN UI, or through the REST APIs (python makes it easier to read):

curl http://<resource manager>:8088/app/v1/services/redis-service?user.name=ambari-qa | python -m json.tool

The service name must be unique in the cluster. If you need to delete your service, the following command can be used:

curl -X DELETE http://<resource manager>:8088/app/v1/services/redis-service?user.name=ambari-qa


Testing with Kerberos

Create a kerberos principal of the format <username>/<hostname>@<realm>

The hostname portion of the principal is required.

Create a keytab for the principal and upload it to HDFS

kadmin.local
>addprinc user1/host1.example.com@EXAMPLE.COM
...
>xst -k user1_host1.keytab user1/host1.example.com@EXAMPLE.COM
...
>exit
hadoop fs -put user1_host1.keytab hdfs:/user/user1/
hadoop fs -chown user1 hdfs:/user/user1/

Place the following service definition into a file (e.g. yarnservice.json)

{
  "name": "redis-service",
  "version": "1.0.0",
  "description": "redis example",
  "components" :
    [
      {
        "name": "redis",
        "number_of_containers": 1,
        "artifact": {
          "id": "library/redis",
          "type": "DOCKER"
        },
        "launch_command": "",
        "resource": {
          "cpus": 1,
          "memory": "256"
        },
        "configuration": {
          "env": {
            "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE": "true"
          }
        }
      }
    ],
    "kerberos_principal": {
      "principal_name": "user1/host1.example.com@EXAMPLE.COM",
      "keytab": "hdfs:/user/user1/user1_host1.keytab"
    }
}

Submit the service with the following curl command. YARN should respond back with the applicationId

User1 will need permission to write into their HDFS home directory: (hdfs:/user/user1)

curl --negotiate -u : -X POST -H "Content-Type: application/json" http://<resource manager>:8088/app/v1/services -d @yarnservice.json

The service status can be viewed on the YARN UI, or through the REST APIs (python makes it easier to read):

curl --negotiate -u : http://<resource manager>:8088/app/v1/services/redis-service | python -m json.tool

The service name must be unique in the cluster. If you need to delete your service, the following command can be used:

curl --negotiate -u : -X DELETE http://<resource manager>:8088/app/v1/services/redis-service


Adding a local docker registry

Each node in the cluster needs a way of downloading docker images when a service is run. It is possible to just use the public docker hub, but that is not always an option. Similar to creating a local repo for yum, a local registry can be created for Docker. Here is a quickstart that skips the security steps. In production, security best practices should be followed:

On a master node, create an instance of the docker registry container. This will bind the registry to port 5000 on the host machine.

docker run -d -p 5000:5000 --restart=always --name registry -v /mnt/registry:/var/lib/registry registry:2 

Configure each machine to skip HTTPS checks: https://docs.docker.com/registry/insecure/. Here are commands for CentOS 7:

pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo echo '{\"insecure-registries\": [\"<registryHost>:5000\"]}' | sudo tee --append /etc/docker/daemon.json"
pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo systemctl restart docker"

The YARN service configuration, docker_trusted_registries, needs to be set to star ( * ) or needs to have this local registry in its list (e.g. library,<registryHost>:5000).

Restart YARN


Testing Local Docker Registry

Build, tag, and push an image to the registry

docker build -t myImage:1 .
docker tag myImage:1 <registryHost>:5000/myImage:1
docker push <registryHost>:5000/myImage:1

View image via REST

curl <registryHost>:5000/v2/_catalog
curl <registryHost>:5000/v2/_catalog/myImage/tags/list

Download image to all hosts in cluster (only necessary to demonstrate connectivity. Docker and YARN do this automatically)

pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo docker pull <registryHost>:5000/myImage:1"

Now, when an image with this registry prefix (e.g. <registryHost>:5000/myImage:1) is used in a YARN service definition, YARN will use the image from this local registry instead of trying to pull from the default public location.

2,902 Views