How to Schedule Impala Jobs with Oozie

by Community Manager on ‎10-01-2015 09:40 AM

Summary

 

Due to the fact that there is not currently an Impala action within Oozie, this guide will show you how to create an Impala-based action within Oozie

 

 

Applies To

 

Impala, Oozie

 

Instructions

Currently there is not an Impala action, so you must use a shell action that calls impala-shell.  The shell script that calls impala-shell must also include an entry to set the PYTHON EGGS location.  Here is an example shell script:

 

#!/bin/bash
export PYTHON_EGG_CACHE=./myeggs
/usr/bin/kinit -kt YourKeytabFile.keytab -V <your username>
impala-shell -q "invalidate metadata"

 

NOTICE the PYTHON_EGG_CACHE, this is the location you must set or the job will fail.  This also does a kinit in the case of a kerberized cluster.  Here is the workflow that goes with that script:

 

 

<workflow-app name="shell-impala-invalidate-wf" xmlns="uri:oozie:workflow:0.4">
<start to="shell-impala-invalidate"/>
    <action name="shell-impala-invalidate">
      <shell xmlns="uri:oozie:shell-action:0.1">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <configuration>
          <property>
            <name>mapred.job.queue.name</name>
            <value>${queueName}</value>
          </property>
        </configuration>
        <exec>shell-impala-invalidate.sh</exec>
        <file>shell-impala-invalidate.sh#shell-impala-invalidate.sh</file>
        <file>YourKeytabFile.keytab#YourKeytabFile.keytab</file>
      </shell>
      <ok to="end"/>
      <error to="kill"/>
    </action>
    <kill name="kill">
      <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
  <end name="end"/>
</workflow-app>

 

You must include the <file> tag with the shell script, but not the keytab part unless you are using kerberos.

 

 

References

Comments
by xylenet
‎10-01-2015 05:42 PM - edited ‎10-02-2015 02:13 AM

Hi there first of all Thank you for willing to enlighten me, I was wondering if I could do this by submitting a sample shell job xml on an Oozie Rest WEB API Service. Tried doing hive with the Oozie Rest WEB API and it accepts it, but not a shell, or maybe im doing it wrong.

 

I tried having just a simple job, with inside HDFS is the workflow.xml and the script to be used.

 

the script only contains

 

#!/bin/bash

 

impala-shell -f <local-file-directory>/file.sql

 

and im still having exit code -1 or something else

by Community Manager
on ‎10-05-2015 12:26 PM

@xylenet, from what I'm told this may be a pretty involved process and there's no easy/quick answer to that question.  It might be something you'd get better traction on by starting a "custom REST API call in Oozie" topic in the Oozie board area?  More oozie experts will be listening over there.

 

HTH,

 

Clint

by xylenet
‎10-05-2015 11:09 PM - edited ‎10-05-2015 11:41 PM

Hi Clint, thanks for the additional link. Will be posting my concerns there and hope for the best.

 

And btw, i may have missed that but, what python egss are we talking about here. Im just using the quickstart of cloudera.

by Cloudera Employee Harsh J
on ‎11-08-2015 07:24 AM

Impala's 'impala-shell' is a python based program. It utilises some eggs to run itself, so it needs a cache dir to work with. By specifying PYTHON_EGG_CACHE env-var to a relative path, you are allowing it to use a writable location within the container environment/user restrictions to do that.

by epishkin
‎01-05-2016 01:04 PM - edited ‎01-05-2016 01:27 PM

How does setting PYTHON_EGG_CACHE help if impala-shell script rewrites it's value? See lines 36-39

https://github.com/cloudera/Impala/blob/cdh5-trunk/shell/impala-shell#L36-L39

 

My workaround is to add this statement into a script before running impala-shell. This works with CDH 5.5.0 and impala 2.3.0-cdh5.5.0

#a workaround for the 'Error In Running Impala From Oozie' issue
#impala-shell uses $USER and not $(whoami) 
if [ "$(whoami)" = "yarn" ]; then
export USER=yarn
export PYTHON_EGG_CACHE=/tmp/impala-shell-python-egg-cache-${USER}
fi

echo "$QUERY" | impala-shell -i "localhost" -f -

 

by mageru9
on ‎03-28-2016 04:24 AM
Thanks for this post, saved me some pain.
by rspwilliam
on ‎03-31-2016 03:48 AM

Hi ,

 

I am trying to use oozie shell action to call shell script , Inside the shell there are few hive commands i am using.

 

When i run the oozie jobs it always fails.

 

We are using CDH 5.4.8, Can anyone suggest is it possible with shell to call hive command or not.

 

The script works standalone but with oozie it's not working.

Disclaimer: The information contained in this article was generated by third-parties and not by Cloudera or it's personnel. Cloudera cannot guarantee its accuracy or efficacy. Cloudera disclaims all warranties of any kind and users of this information assume all risk associated with it and with following the advice or directions contained herein. By visiting this page, you agree to be bound by the Terms and Conditions of Site Usage , including all disclaimers and limitations contained therein.