Support Questions

Find answers, ask questions, and share your expertise

How to specify libpath for oozie?

avatar

Hi,

I am scheduling an oozie workflow with falcon. The WF is responsible for executing a shell script that runs spark-submit. Sometimes it is working and the jobs end successfully, but most of the time they get killed. In the oozie error logs, there are a few warnings like:

2017-07-06 14:10:01,907  WARN ParameterVerifier:523 - SERVER[<host>] USER[ambari-qa] GROUP[-] TOKEN[] APP[FALCON_PROCESS_DEFAULT_estimatePi7] JOB[0000040-170706133706258-oozie-oozi-W] ACTION[0000040-170706133706258-oozie-oozi-W@user-action] The application does not define formal parameters in its XML definition

2017-07-06 14:10:01,952  WARN LiteWorkflowAppService:523 - SERVER[<host>] USER[ambari-qa] GROUP[-] TOKEN[] APP[FALCON_PROCESS_DEFAULT_estimatePi7] JOB[0000040-170706133706258-oozie-oozi-W] ACTION[0000040-170706133706258-oozie-oozi-W@user-action] libpath [hdfs://<host>:8020/user/oozie/shell/lib] does not exist

2017-07-06 14:10:02,202  WARN CompletedActionXCommand:523 - SERVER[<host>] USER[-] GROUP[-] TOKEN[] APP[-] JOB[0000040-170706133706258-oozie-oozi-W] ACTION[0000040-170706133706258-oozie-oozi-W@user-action] Received early callback for action still in PREP state; will wait [10,000]ms and requeue up to [5] more times

2017-07-07 07:43:10,658  WARN ShellActionExecutor:523 - SERVER[<host>] USER[ambari-qa] GROUP[-] TOKEN[] APP[ShellAction] JOB[0000007-170707072402346-oozie-oozi-W] ACTION[0000007-170707072402346-oozie-oozi-W@shellAction] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

Since they are warnings, I don't think they're the reason why the jobs are failing. Nevertheless I've tried to get rid of them and failed...

For the 'Received early callback for action still in PREP state' warning I've added a 30 second sleep in my shell script, but the warning still occurs occasionally.

For the libpath does not exist warning, I've added the following property to the oozie-site.xml

oozie.libpath=${nameNode}/user/oozie/share/lib

I've also added this to my job.properties file and falcon process. The warning still states the libs are missing from /user/oozie/shell/lib. Are these even related? It seems like oozie is searching for lib in the directory I specified for my workflow.xml file

I don't have any idea what to do about the Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1] warning. Can't find anything related to it.

Does anybody have any idea what might cause the jobs to fail?

Below are the configurations files

workflow.xml

<workflow-app name="ShellAction" xmlns="uri:oozie:workflow:0.4">
  <start to="shellAction"/>
    <action name="shellAction">
      <shell xmlns="uri:oozie:shell-action:0.2">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <exec>script.sh</exec>
        <file>/user/oozie/shell/job.properties#job.properties</file>
        <file>/user/oozie/shell/script.sh#script.sh</file>
        <file>/user/oozie/shell/PiEstimation.jar#PiEstimation.jar</file>
        <capture-output/>
      </shell>
      <ok to="end"/>
      <error to="killAction"/>
  </action>
  <kill name="killAction">
    <message>"Killed job due to error"</message>
  </kill>
  <end name="end"/>
</workflow-app>

job.properties

nameNode=hdfs://[<host>]:8020
jobTracker=[<host>]:8050
queueName=default
oozie.wf.application.path=${nameNode}/user/oozie/shell
oozie.libpath=${nameNode}/user/${user.name}/share/lib
oozie.use.system.libpath=true

script.sh

sleep 30

/usr/hdp/current/spark-client/bin/spark-submit --class org.apache.falcon.example.spark.SparkPI --conf spark.ui.port=4050 --driver-memory 2g --executor-memory 1g /apps/spark/PiEstimation.jar 100 >> /apps/spark/PiEstimationOut.log

falcon process

<process xmlns='uri:falcon:process:0.1' name='estimatePi7'>
  <clusters>
    <cluster name='primaryCluster'>
      <validity start='2017-07-07T07:10Z' end='2017-07-07T07:45Z'/>
    </cluster>
  </clusters>
  <parallel>1</parallel>
  <order>LIFO</order>
  <frequency>minutes(5)</frequency>
  <timezone>UTC</timezone>
  <properties>
      <property name="oozie.libpath" value="${nameNode}/user/oozie/share/lib" />
  </properties>  
  <workflow name='ShellAction' engine='oozie' path='/user/oozie/shell/'/>
  <retry policy='periodic' delay='minutes(1)' attempts='3'/>
  <ACL owner='ambari-qa' group='users' permission='0755'/>
</process>

attaching yarn logs

syslog.txtstderr.txtlaunch-containersh.txtdirectoryinfo.txt

1 REPLY 1

avatar
New Contributor

Hi Rafal,

Can you please try the following?

Change the order of property, have oozie.use.system.libpath=true after queuename=default . Execute once and check.

Change oozie.libpath value to ${nameNode}/user/oozie/share/lib/lib_<timestamp> (timestamp is the time stamp number you see under this folder). Execute once and check.

Copy all the jars from ${nameNode}/user/oozie/share/lib/*/*/* to any custom location like ${nameNode}/user/${user.name}/share/lib . Execute once and check.

One of them should workout.