Community Articles
Find and share helpful community-sourced technical articles
Labels (1)
Super Guru

Below are the steps to run Hive(TEZ) query in a shell script using Oozie shell action

.

1. Configure job.properties

Example:

#*************************************************
#  job.properties
#*************************************************
nameNode=hdfs://<namenode-fqdn>:8020
jobTracker=<resourcemanager-host-fqdn>:8050
queueName=default
examplesRoot=examples
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/

.

2. Configure Workflow.xml

Example:

<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.4"
name="test-shell-with-kerberos-wf">
    <global>
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <configuration>
            <property>
                <name>mapred.job.queue.name</name>
                <value>${queueName}</value>
            </property>
            <property>
                <name>tez.queue.name</name>
                <value>${queueName}</value>
            </property>
        </configuration>
    </global>
      <credentials>
      <credential name="hive_credentials" type="hcat">
        <property>
          <name>hcat.metastore.uri</name>
          <value>thrift://<metastore-server>:9083</value>
        </property>
        <property>
          <name>hcat.metastore.principal</name>
          <value>hive/_HOST@REALM</value>
        </property>
      </credential>
    </credentials>
    <start to="run-shell-script"/>
       <action name="run-shell-script" cred="hive_credentials">
        <shell xmlns="uri:oozie:shell-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
                <property>
                    <name>tez.lib.uris</name>
                    <value>/hdp/apps/<VERSION>/tez/tez.tar.gz</value>
                </property>
            </configuration>
            <exec>/user/<username>/hive.sh</exec>
            <file>/user/<username>/hive.sh#hive.sh</file>
         </shell>
        <ok to="end"/>
        <error to="killnode"/>
    </action>
    <kill name="killnode">
        <message>Job failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

.

3. Write sample shell script

Example:

#!/bin/bash
hive --hiveconf mapreduce.job.credentials.binary=$HADOOP_TOKEN_FILE_LOCATION --hiveconf tez.credentials.path=$HADOOP_TOKEN_FILE_LOCATION -e 'select * from test_hive;'

4. Upload workflow.xml and shell script to "oozie.wf.application.path" defined in job.properties

.

5. Follow below command to run Oozie workflow

oozie job -oozie http://<oozie-server-hostname>:11000/oozie -config /$PATH/job.properties -run

.

Please comment if you have any question! Happy Hadooping!! :)

3,209 Views
Comments
Expert Contributor

I faced the following error where the shell script is running over 24 hours, and failing to launch hive scripts after 24 hours with following error.

Workaround: Increase the following Property Value in hive-site.xml and restart Hive Metastore to the desired need of oozie shell script to continue running.

  • hive.cluster.delegation.token.renew-interval (Default: 86400000 i.e. 24hrs)
  • hive.cluster.delegation.token.max-lifetime (Default: 604800000 i.e. 7days)

yarn application log

Stdoutput 16/10/16 17:12:00 [main]: WARN hive.metastore: Failed to connect to the MetaStore Server...
Stdoutput org.apache.thrift.transport.TTransportException: Peer indicated failure: DIGEST-MD5: IO error acquiring password

Hive MetaStore Error:

ERROR [pool-5-thread-198]: transport.TSaslTransport (TSaslTransport.java:open(315)) - SASL negotiation failure
javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: token expired or does not exist: owner=user, renewer=oozie, realUser=oozie/oozie.host.name@EXAMPLE.COM, issueDate=1476560270232, maxDate=1477165070232, sequenceNumber=51, masterKeyId=714]