Support Questions

Find answers, ask questions, and share your expertise

How many times falcon retry for failed jobs in hdp2.3.4 version ?

Guru

Team:

I have noticed one annoying behavior of falcon,Actually for a job in hdp2.2 falcon was retrying to run jobs 10 times but in new current version it is just failing after 1 try.

So I just want to know whether its a change in hdp2.3 or some default behavior which we can change accordingly ?

Thanks in advance.

1 ACCEPTED SOLUTION

Guru

@Sowmya Ramesh

@Benjamin Leonhardi,

I found the solution for this issue. Actually Before upgrade the value for "oozie.wf.rerun.failnodes" was "false". But after upgrade to HDP-2.3.4, value for "oozie.wf.rerun.failnodes" is "true",so that only failed action node in Oozie workflow instance run thus to prevent the rerun of successful action in Oozie.

it is required to set following property in properties section in Process entity. <property name="oozie.wf.rerun.failnodes" value="false"/>

View solution in original post

7 REPLIES 7

Falcon has the following parameter that can be set. The retry policy.

<retry policy="exp-backoff" delay="hours(1)" attempts="1"/>

https://falcon.apache.org/EntitySpecification.html

Search for Retry

Guru

@Benjamin Leonhardi: I have following parameter in my proces.xml

<retrypolicy="periodic"delay="minutes(30)"attempts="10"/>

and following error in logs, so don't see anywhere where it is retrying 10 times.

2016-05-07 12:12:25,037 INFO - [RetryHandler:] ~ {Action:retry-instance-failed, Dimensions:{run-id=0, wf-id=0013624-160421060930490-oozie-oozi-W, nominal-name=2016-05-07T15:30Z, wf-user=hdpbatch, entity-type=PROCESS, error-message=Rerun file deleted or renamed for process-instance:, entity-name=hdp0186h-sitecatalyst-kpis-generation-android-events-hourly-process}, Status: SUCCEEDED, Time-taken:4952 ns} (METRIC:38)

2016-05-07 12:12:25,037

As @Benjamin Leonhardi specified Falcon should honor the retry attempts in Retry policy. If its not working as expected please create a support issue. Thanks!

Guru

@Sowmya Ramesh: Yes, I have opened a case and wokring with HW team.

I had in what I send you and it seemed to work. I would open a support case.

Guru

Guru

@Sowmya Ramesh

@Benjamin Leonhardi,

I found the solution for this issue. Actually Before upgrade the value for "oozie.wf.rerun.failnodes" was "false". But after upgrade to HDP-2.3.4, value for "oozie.wf.rerun.failnodes" is "true",so that only failed action node in Oozie workflow instance run thus to prevent the rerun of successful action in Oozie.

it is required to set following property in properties section in Process entity. <property name="oozie.wf.rerun.failnodes" value="false"/>

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.