Support Questions

Find answers, ask questions, and share your expertise

USER-RETRY not working in workflow

avatar
Contributor

Hello,

I'm trying to create workflow which incorporates retry-max and retry-interval. Workflow is cotrolled by coordinator which fires it when files are available in HDFS. Part of workflow is a Hive query, which runs fine, in case files are found in directory.

I want workflow to run again for few times in case files are not yet available in HDFS. Therefore I deliberatele remove files from HDFS and then reload them again in order to test funcionality of user-retry

Workflow is KILLED and do not go to USER-RETRY state due to this error:

WARN HiveActionExecutor:523 - SERVER[localhost] USER[hadoopmgr] GROUP[-] TOKEN[] APP[hive-wf] JOB[0000001-160530104739010-oozie-oozi-W] ACTION[0000001-160530104739010-oozie-oozi-W@stockjob] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [40000]

Coordinator.xml:

<coordinator-app name="stockjob" frequency="${coord:days(1)}" start="${start}" end="${end}" timezone="Europe/Amsterdam"
xmlns="uri:oozie:coordinator:0.2">
<controls>
<concurrency>1</concurrency>
<execution>FIFO</execution>
<throttle>5</throttle>
</controls>
<datasets>
<dataset name="dindc1" frequency="${coord:days(1)}"
initial-instance="2016-05-30T07:00Z" timezone="Europe/Amsterdam">
<uri-template>${nameNode}/user/hadoopmgr/wfArc/data/in/${YEAR}${MONTH}${DAY}/</uri-template>
<done-flag></done-flag>
</dataset>
<dataset name="dout" frequency="${coord:days(1)}"
initial-instance="2016-05-30T07:00Z" timezone="Europe/Amsterdam">
<uri-template>${nameNode}/user/hadoopmgr/wfArc/data/out/${YEAR}${MONTH}${DAY}/</uri-template>
<done-flag></done-flag>
</dataset>
</datasets>
<input-events>
<data-in name="eindc1" dataset="dindc1">
<instance>${coord:current(0)}</instance>
</data-in>
</input-events>
<output-events>
<data-out name="eout" dataset="dout">
<instance>${coord:current(0)}</instance>
</data-out>
</output-events>
<action>
<workflow>
<app-path>${workflowAppUri}</app-path>
<configuration>
<property>
<name>jobTracker</name>
<value>${jobTracker}</value>
</property>
<property>
<name>nameNode</name>
<value>${nameNode}</value>
</property>
<property>
<name>queueName</name>
<value>${queueName}</value>
</property>
<property>
<name>inputPath1</name>
<value>${coord:dataIn('eindc1')}</value>
</property>
<property>
<name>outputPath1</name>
<value>${coord:dataOut('eout')}</value>
</property>
<property>
<name>the_timestamp</name>
<value>${coord:formatTime(coord:nominalTime(), 'yyyy-MM-dd')}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>

Workflow.xml:

<workflow-app name="hive-wf" xmlns="uri:oozie:workflow:0.3">
<credentials>
<credential name="hive_credentials" type="hcat">
<property>
<name>hcat.metastore.uri</name>
<value>${tHrift}</value>
</property>
<property>
<name>hcat.metastore.principal</name>
<value>${principal}</value>
</property>
</credential>
</credentials>
<start to="stockjob"/>
<action name="stockjob" retry-max="3" retry-interval="10" cred="hive_credentials">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>/user/hadoopmgr/wfArc/wf1/hive-site.xml</job-xml>
<configuration>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>oozie.hive.defaults</name>
<value>/user/hadoopmgr/wfArc/wf2/hive-site.xml</value>
</property>
</configuration>
<script>/user/hadoopmgr/wfArc/wf2/sl_stock.hql</script>
<param>inputPath1=${inputPath1}</param>
<param>tableName1=${the_timestamp}</param>
<param>outputPath1=${outputPath1}</param>
</hive>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>hql script failed [${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>

Thanks

Jan

1 REPLY 1

avatar
Explorer

log shows:

WARN HiveActionExecutor:523- SERVER[localhost] USER[hadoopmgr] GROUP[-] TOKEN[] APP[hive-wf] JOB[0000001-160530104739010-oozie-oozi-W] ACTION[0000001-160530104739010-oozie-oozi-W@stockjob]Launcher ERROR, reason:Mainclass[org.apache.oozie.action.hadoop.HiveMain],exit code [40000]

try this:

type content in oozie-site.xml, then restart oozie server instance

<property>

<name>oozie.service.LiteWorkflowStoreService.user.retry.error.code.ext</name>
<value> 4000</value>
<description>
Automatic retry interval for workflow action is handled for
these specified extra error code.
</description>
</property>

hope this help

also link https://issues.apache.org/jira/browse/OOZIE-10