Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Impala schedule with oozie -tutorial

avatar
Champion Alumni

Hello,

I'm searching for a good tutorial about how to schedule impala jobs into oozie.

 

The only threads that I found about this subject are:

 

o https://issues.apache.org/jira/browse/OOZIE-1591

o https://groups.google.com/a/cloudera.org/forum/#!topic/impala-user/8vM7fKR7F3A

 

 

Can you please help? (give at least one example)

GHERMAN Alina
1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hey,

 

Currently there is not an Impala action, so you must use a shell action that calls impala-shell.  The shell script that calls impala-shell must also include an entry to set the PYTHON EGGS location.  Here is an example shell script:

 

#!/bin/bash

export PYTHON_EGG_CACHE=./myeggs
/usr/bin/kinit -kt cconner.keytab -V cconner
impala-shell -q "invalidate metadata"

 

NOTICE the PYTHON_EGG_CACHE, this is the location you must set or the job will fail.  This also does a kinit in the case of a kerberized cluster.  Here is the workflow that goes with that script:

 

<workflow-app name="shell-impala-invalidate-wf" xmlns="uri:oozie:workflow:0.4">

<start to="shell-impala-invalidate"/>

<action name="shell-impala-invalidate">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>shell-impala-invalidate.sh</exec>
<file>shell-impala-invalidate.sh#shell-impala-invalidate.sh</file>
<file>cconner.keytab#cconner.keytab</file>
</shell>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>

 

You must include the <file> tag with the shell script, but not the keytab part unless you are using kerberos.

 

Hope this helps.

 

Thanks

Chris

View solution in original post

16 REPLIES 16

avatar
Community Manager

Thanks to all who participated in this thread. Check out the Community Knowledge article we created based on it. 🙂


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
New Contributor

 

2016-09-01 12:32:17,252 WARN org.apache.oozie.action.hadoop.ShellActionExecutor: SERVER[******] USER[****] GROUP[-] TOKEN[] APP[PV3] JOB[0000050-160512133914543-oozie-oozi-W] ACTION[0000050-160512133914543-oozie-oozi-W@shell-7170] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.ShellMain], main() threw exception, Cannot run program "shell-impala-invalidate.sh" (in directory "/data/4/yarn/nm/usercache/****/appcache/application_1463053085953_30120/container_e49_1463053085953_30120_01_000002"): error=2, No such file or directory
2016-09-01 12:32:17,252 WARN org.apache.oozie.action.hadoop.ShellActionExecutor: SERVER[******] USER[****] GROUP[-] TOKEN[] APP[PV3] JOB[0000050-160512133914543-oozie-oozi-W] ACTION[0000050-160512133914543-oozie-oozi-W@shell-7170] Launcher exception: Cannot run program "shell-impala-invalidate.sh" (in directory "/data/4/yarn/nm/usercache/****/appcache/application_1463053085953_30120/container_e49_1463053085953_30120_01_000002"): error=2, No such file or directory
java.io.IOException: Cannot run program "shell-impala-invalidate.sh" (in directory "/data/4/yarn/nm/usercache/*******/appcache/application_1463053085953_30120/container_e49_1463053085953_30120_01_000002"): error=2, No such file or directory

 

 

 

 

I tried to follow the tutorial, but for some reasons i get the following error. 

My workflow.xml

<workflow-app name="shell-impala-invalidate-wf" xmlns="uri:oozie:workflow:0.4">
<start to="shell-impala-invalidate"/>
<action name="shell-impala-invalidate">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>shell-impala-invalidate.sh</exec>
<file>shell-impala-invalidate.sh#shell-impala-invalidate.sh</file>
<file>shell-impala-invalidate.sql#shell-impala-invalidate.sql</file>
<file>****.keytab#****.keytab</file>
</shell>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>

Shell:

 

#!/bin/bash
LOG=/tmp/shell-impala-invalidate-$USER.log
ls -alrtR > $LOG  #This will show you all the files in the directory and their relative paths
export PYTHON_EGG_CACHE=./myeggs
/usr/bin/kinit -kt ***.keytab -V ***
/usr/bin/klist -e >> $LOG
hadoop fs -put $LOG /tmp  #put the log file in HDFS to find it easily
impala-shell -f shell-impala-invalidate.sql

I feel a bit of lost now. Could any of you please give me a push, what is wrong?

Thanks

 

 

avatar
Champion Alumni

Hello,

 

Your problem is not linked to the impala scheduler, but to the shell. In fact oozie cannot find your shell file.

 

1.What are the permission on the file? does oozie has acces?

shell-impala-invalidate.sh

2.What are the premission to the folder?

/data/4/yarn/nm/usercache/*******/appcache/application_1463053085953_30120/container_e49_1463053085953_30120_01_000002

(this folder is on one of your workers)

 

Alina

GHERMAN Alina

avatar
New Contributor

Hi Alina,

Permissions

 

First attempt:
I placed everything into my own library user / username, and gave the same permissions as in the attached picture.
Second attempt:
All the files were placed into the oozie workspace, (user/hue/oozie/workspaces/...), with the above permissions but the log message is still the same.

2.:
Can I check the permission to this folder through the webUI?

avatar
Explorer

Thyanks for the fruitful discussion , it saved my time. I have just made it working by making below changes 

My workflow app xml :

 

<workflow-app name="shell-impala-invalidate-wf" xmlns="uri:oozie:workflow:0.4">
<start to="shell-impala-invalidate"/>
<action name="shell-impala-invalidate">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC}</exec>
<file>${EXEC}#${EXEC}</file>
<file>${EXECQSCRIPT}#${EXECQSCRIPT}</file>
</shell>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>

 

where EXEC and EXECQSCRIPT are the path to the shell script and my sql file respectively.

 

In my shell script , I just need to add the following lines (I am not using secured environment here, so no Kerberos)

 

export PYTHON_EGG_CACHE= /home/python-egg-catch

impala-shell -f shellq.sql

 

where shellq.sql is my query file and python-egg-catch is the folder I have created with proper permission.

 

Thanks

 

 

avatar
New Contributor

Hi - see this is quite an old thread, but was very useful to get me going.  I've ended up writing a wrapper script to make the shell action behave like a HiveServer2 action from a parameter perspective.  Here it is:

 

Usage Notes:

There are 4 mandatory parameters which must be in the correct ordinal position:

  1. Script file name: e.g. test.sql **MUST BE REFERENCED AS A FILE IN THE ACTION
  2. The name of the keytab e.g. smithjon.keytab **MUST BE REFERENCED AS A FILE IN THE ACTION OR "NO_KEYTAB" IF NOT RUNNING IN A KERBERISED ENVIRONMENT
  3. The name of the principle in the keytab (who is requesting the query) e.g. smithjon@FOO.BAR.NET **USE NO_PRINCIPAL OR LEAVE BLANK IF NOT RUNNING IN A KERBERISED ENVIRONMENT
  4. The name of the impala service principle that will be executing the query e.g. fp-service-impala-p

Parameter 5 onwards can be used for parameters in your SQL file that you want to be substituted in your reference SQL file.  These take the form of <key>=<value> - and work in exactly the same way as HiveServer2 actions in that a parameter in the SQL file in the format "${<name>}" will be substituted with "<value>".  In our example we have 2 additional parameters:

  1. db_name=bddqsit01p
  2. table_name=HELLO_IMPALA

So the ${db_name} and ${table_name} tokens will be substituted with the supplied values:

 

#!/bin/bash
export PYTHON_EGG_CACHE=./myeggs
#log the variables passed
echo "script file: $1"
echo "keytab file: $2"
echo "local user: $3"
echo "impala user: $4"
# grab sql file into variable
sql=`cat $1`
 echo "Redirecting raw sql to stderr to avoid stdout buffer issues"
 (>&2 echo "Received the following SQL")
 (>&2 echo $sql)
#loop through all parameters and sed into the sql variable
COUNTER=1
for TOKEN in $*
do
  if [ $COUNTER -gt 4 ] ; then
   IFS='='; arrTOKEN=($TOKEN); unset IFS
   currKey=${arrTOKEN[0]}
   currValue=${arrTOKEN[1]}
   echo "$currKey=$currValue"
   sql=$(echo $sql| sed -e "s/\${$currKey}/$currValue/g")
  fi
  let COUNTER=COUNTER+1
done
#kinit if required.  Keytab name is the second parameter, user id the third
if [ $2 != "NO_KEYTAB" ]  ; then
 echo "Invoking with keytab"
 /usr/bin/kinit -kt $2 -V $3
 echo "Redirecting parsed sql to stderr to avoid stdout buffer issues"
 (>&2 echo "Executing the following SQL")
 (>&2 echo $sql)
 impala-shell -q "$sql" -k -s $4
else
 echo "Invoking with no keytab"
 echo "Redirecting parsed sql to stderr to avoid stdout buffer issues"
 (>&2 echo "Executing the following SQL")
 (>&2 echo $sql)
 impala-shell -q "$sql"
fi

avatar
Hi. How can you find your workflow from list of all workflows? They are nameless...