Support Questions

Find answers, ask questions, and share your expertise

How many times falcon retry for failed jobs in hdp2.3.4 version ?

avatar
Guru

Team:

I have noticed one annoying behavior of falcon,Actually for a job in hdp2.2 falcon was retrying to run jobs 10 times but in new current version it is just failing after 1 try.

So I just want to know whether its a change in hdp2.3 or some default behavior which we can change accordingly ?

Thanks in advance.

1 ACCEPTED SOLUTION

avatar
Guru

@Sowmya Ramesh

@Benjamin Leonhardi,

I found the solution for this issue. Actually Before upgrade the value for "oozie.wf.rerun.failnodes" was "false". But after upgrade to HDP-2.3.4, value for "oozie.wf.rerun.failnodes" is "true",so that only failed action node in Oozie workflow instance run thus to prevent the rerun of successful action in Oozie.

it is required to set following property in properties section in Process entity. <property name="oozie.wf.rerun.failnodes" value="false"/>

View solution in original post

7 REPLIES 7

avatar
Master Guru

Falcon has the following parameter that can be set. The retry policy.

<retry policy="exp-backoff" delay="hours(1)" attempts="1"/>

https://falcon.apache.org/EntitySpecification.html

Search for Retry

avatar
Guru

@Benjamin Leonhardi: I have following parameter in my proces.xml

<retrypolicy="periodic"delay="minutes(30)"attempts="10"/>

and following error in logs, so don't see anywhere where it is retrying 10 times.

2016-05-07 12:12:25,037 INFO - [RetryHandler:] ~ {Action:retry-instance-failed, Dimensions:{run-id=0, wf-id=0013624-160421060930490-oozie-oozi-W, nominal-name=2016-05-07T15:30Z, wf-user=hdpbatch, entity-type=PROCESS, error-message=Rerun file deleted or renamed for process-instance:, entity-name=hdp0186h-sitecatalyst-kpis-generation-android-events-hourly-process}, Status: SUCCEEDED, Time-taken:4952 ns} (METRIC:38)

2016-05-07 12:12:25,037

avatar

As @Benjamin Leonhardi specified Falcon should honor the retry attempts in Retry policy. If its not working as expected please create a support issue. Thanks!

avatar
Guru

@Sowmya Ramesh: Yes, I have opened a case and wokring with HW team.

avatar
Master Guru

I had in what I send you and it seemed to work. I would open a support case.

avatar
Guru

avatar
Guru

@Sowmya Ramesh

@Benjamin Leonhardi,

I found the solution for this issue. Actually Before upgrade the value for "oozie.wf.rerun.failnodes" was "false". But after upgrade to HDP-2.3.4, value for "oozie.wf.rerun.failnodes" is "true",so that only failed action node in Oozie workflow instance run thus to prevent the rerun of successful action in Oozie.

it is required to set following property in properties section in Process entity. <property name="oozie.wf.rerun.failnodes" value="false"/>