Community Articles

Find and share helpful community-sourced technical articles.
avatar
Rising Star

Steps on how to setup YARN to run docker containers can be found in Part 1: article


In this article I will show how to run Hive components (Hiverserver2, Metastore) as docker containers in YARN.

Metastore will be using a Mysql 8 database also running as a docker container in a local host.


Pre-requisites:

1. Pull mysql-server image from Docker hub, run image as a docker container, create hive database and permissions for hive user :

docker pull mysql/mysql-server
docker run -d -p 3306:3306 -e MYSQL_ROOT_PASSWORD=admin  --restart=always --name mysqld mysql/mysql-server

docker exec -it mysqld bash
bash-4.2# mysql -u root --password=admin

mysql> CREATE DATABASE hive;
mysql> CREATE USER 'hive' IDENTIFIED BY 'hive';
mysql> GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'%' WITH GRANT OPTION;


2. Create user 'hive' and assign to 'hadoop' group:

useradd hive
usermod -aG hadoop hive


Dockerize Hive:

1. Create a yum repo file "hdp.repo" that contains HDP-3.1.0.0 and HDP-UTILS-1.1.0.22 repositories:

  1. [HDP-3.1.0.0]
  2. name=HDP Version - HDP-3.1.0.0
  3. baseurl=http://public-repo-1.hortonworks.com/HDP/centos7/3.x/updates/3.1.0.0
  4. gpgcheck=1
  5. gpgkey=http://public-repo-1.hortonworks.com/HDP/centos7/3.x/updates/3.1.0.0/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
  6. enabled=1
  7. priority=1
  8.  
  9.  
  10. [HDP-UTILS-1.1.0.22]
  11. name=HDP-UTILS Version - HDP-UTILS-1.1.0.22
  12. baseurl=http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.22/repos/centos7
  13. gpgcheck=1
  14. gpgkey=http://public-repo-1.hortonworks.com/HDP/centos7/3.x/updates/3.1.0.0/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
  15. enabled=1
  16. priority=1


2. Create the Dockerfile:

FROM centos:7
ENV JAVA_HOME /usr/lib/jvm/jre-1.8.0-openjdk
COPY hdp.repo /etc/yum.repos.d/
COPY mysql-connector-java-8.0.14-1.el7.noarch.rpm /root/

RUN yum updateinfo \
    && yum install -y sudo java-1.8.0-openjdk-devel hadoop-yarn hadoop-mapreduce  hive hive-metastore tez \
    && yum clean all

RUN yum localinstall -y /root/mysql-connector-java-8.0.14-1.el7.noarch.rpm

RUN cp /usr/share/java/mysql-connector-java-8.0.14.jar /usr/hdp/3.1.0.0-78/hive/lib/mysql-connector-java.jar

Note: For the metastore to connect to our Mysql database we need a JDBC connector. I've downloaded it from here and copied the .rpm file to the same directory as my Dockerfile, so it is installed on the image.


3. Build the image:

  1. docker build -t hive .


4. Tag the image and push it to the docker local registry:

Tag the image as “<docker registry server>:5000/hive_local”. This creates an additional tag for the existing image. When the first part of the tag is a hostname and port, Docker interprets this as the location of a registry.

  1. docker tag hive <docker registry server>:5000/hive_local
  2. docker push <docker registry server>:5000/hive_local


Now that our hive image is created we will create a Yarn Service configuration file (Yarnfile) with all the details of our service.


Deployment:

1. Copy core-site.xml, hdfs-site.xml and yarn-site.xml to hive user dir in HDFS:

  1. su - hive
  2. hdfs dfs -copyFromLocal /etc/hadoop/conf/core-site.xml .
  3. hdfs dfs -copyFromLocal /etc/hadoop/conf/hdfs-site.xml .
  4. hdfs dfs -copyFromLocal /etc/hadoop/conf/yarn-site.xml .


2. Create YarnFile (hive.json):

{
  "name": "hive",
  "lifetime": "-1",
  "version": "3.1.0.3.1.0.0",
  "artifact": {
    "id": "<docker registry server>:5000/hive2",
    "type": "DOCKER"
  },
  "configuration": {
    "env": {
      "HIVE_LOG_DIR": "var/log/hive",
      "YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS": "/etc/passwd:/etc/passwd:ro,/etc/group:/etc/group:ro",
      "HADOOP_HOME": "/usr/hdp/3.1.0.0-78/hadoop"
    },
    "properties": {
      "docker.network": "host"
    },
    "files": [
      {
        "type": "TEMPLATE",
        "dest_file": "/etc/hadoop/conf/core-site.xml",
        "src_file": "core-site.xml"
      },
      {
        "type": "TEMPLATE",
        "dest_file": "/etc/hadoop/conf/yarn-site.xml",
        "src_file": "yarn-site.xml"
      },
      {
        "type": "TEMPLATE",
        "dest_file": "/etc/hadoop/conf/hdfs-site.xml",
        "src_file": "hdfs-site.xml"
      },
      {
        "type": "XML",
        "dest_file": "/etc/hive/conf/hive-site.xml",
        "properties": {
          "hive.zookeeper.quorum": "${CLUSTER_ZK_QUORUM}",
          "hive.zookeeper.namespace": "hiveserver2",
      "hive.server2.zookeeper.publish.configs": "true",
      "hive.server2.support.dynamic.service.discovery": "true",
      "hive.support.concurrency": "true",
      "hive.metastore.warehouse.dir": "/user/${USER}/warehouse",
      "javax.jdo.option.ConnectionUserName": "hive",
      "javax.jdo.option.ConnectionPassword": "hive",
      "hive.server2.enable.doAs": "false",
      "hive.metastore.schema.verification": "true",
      "hive.metastore.db.type": "MYSQL",
      "javax.jdo.option.ConnectionDriverName": "com.mysql.jdbc.Driver",
      "javax.jdo.option.ConnectionURL": "jdbc:mysql://<mysql-server docker host>:3306/hive?createDatabaseIfNotExist=true",
      "hive.metastore.event.db.notification.api.auth" : "false",
      "hive.metastore.uris": "thrift://hivemetastore-0.${SERVICE_NAME}.${USER}.${DOMAIN}:9083"
        }
      }
    ]
  },
  "components": [
    {
      "name": "hiveserver2",
      "number_of_containers": 1,
      "launch_command": "sleep 25; /usr/hdp/current/hive-server2/bin/hiveserver2",
      "resource": {
        "cpus": 1,
        "memory": "1024"
      },
      "configuration": {
    "files": [
        {
          "type": "XML",
          "dest_file": "/etc/hive/conf/hive-site.xml",
          "properties": {
            "hive.server2.thrift.bind.host": "${COMPONENT_INSTANCE_NAME}.${SERVICE_NAME}.${USER}.${DOMAIN}",
            "hive.server2.thrift.port": "10000",
            "hive.server2.thrift.http.port": "10001"
          }
        }
    ],    
        "env": {
          "HADOOP_OPTS": "-Xmx1024m -Xms512m"
        }
      }
    },
    {
      "name": "hivemetastore",
      "number_of_containers": 1,
      "launch_command": "sleep 5;/usr/hdp/current/hive-metastore/bin/schematool -initSchema -dbType mysql;/usr/hdp/current/hive-metastore/bin/hive --service metastore",
      "resource": {
        "cpus": 1,
        "memory": "1024"
      },
      "configuration": {
        "files": [
        {
          "type": "XML",
          "dest_file": "/etc/hive/conf/hive-site.xml",
          "properties": {
            "hive.metastore.uris": "thrift://${COMPONENT_INSTANCE_NAME}.${SERVICE_NAME}.${USER}.${DOMAIN}:9083"
          }
        }
        ],
        "env": {
          "HADOOP_OPTS": "-Xmx1024m -Xms512m"
        }
      }
    }
  ]
}


3. Deploy application using the YARN Services API:

yarn app launch -hive hive.json


Test access to Hive:

The Registry DNS service that runs on the cluster listens for inbound DNS requests. Those requests are standard DNS requests from users or other DNS servers (for example, DNS servers that have the RegistryDNS service configured as a forwarder) If we have this setup, we can connect via beeline to "${COMPONENT_INSTANCE_NAME}.${SERVICE_NAME}.${USER}.${DOMAIN}" hostname as long as our client is using the corporate DNS server.

More details here


Because in this test we don't have this configured we need to manually find were the hiveserver2 docker container is running with:

curl -X GET 'http://<RM-host>:8088/app/v1/services/hive?user.name=hive' | python -m json.tool

On the containers information for component hiveserver2-0 we will find:

"containers": [
                {
                    "bare_host": "<hiveserver2-0 host hostname>",


On the host were the container is running connect to hive via beeline:

su - hive
beeline -u "jdbc:hive2://<hostname -I>:10000/default"


References:

https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/yarn-service/Configurations.html

http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/RegistryDNS.html

https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/YarnServiceAPI.html


Files are also available in the following GitHub repo:

https://github.com/PedroAndrade89/docker_hdp_services.git

4,886 Views