This guide has been tested with and without Kerberos on HDP 3.0.1.
YARN offers a DNS service backed by Zookeeper for service discovery, but that can be challenging to setup. For a quickstart scenario, I will use docker swarm and an overlay network instead. If your environment is a single host, the networking is even simpler. This configuration is not recommended for production.
I will use pssh to run commands in parallel across the cluster based on a hostlist file and a workerlist file. The hostlist file should contain every host in the cluster, and the workerlist file should include every node except for the one chosen to be the docker swarm master node.
Note that this allows any image from docker hub to be run. to limit the that docker images that can be run, set this property to a comma separated list of trusted registries. Docker images have the form <registry>/<imageName>:<tag>.
Alternatively, the following Ambari blueprint encapsulates these configurations:
Each node in the cluster needs a way of downloading docker images when a service is run. It is possible to just use the public docker hub, but that is not always an option. Similar to creating a local repo for yum, a local registry can be created for Docker. Here is a quickstart that skips the security steps. In production, security best practices should be followed:
On a master node, create an instance of the docker registry container. This will bind the registry to port 5000 on the host machine.
Now, when an image with this registry prefix (e.g. <registryHost>:5000/myImage:1) is used in a YARN service definition, YARN will use the image from this local registry instead of trying to pull from the default public location.