Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Oozie Action Status not updating

avatar
New Contributor

Oozie example ob for java-main app invokes a MR2 job but then the action status for the java actio ndoes not get updated to OK immediately though the MR job is completed.

 

It takes like 10 minutesi n oozie to update the action status and then end the job successfully.

 

We are using CDH 5.11.

 

6 REPLIES 6

avatar
Mentor
The way a job end is notified back to Oozie at the end of the MR job execution is via the callback interface. Often, depending on your network configuration between NodeManagers and Oozie hosts, or Oozie security configurations (such as TLS and Load Balancers) this callback interaction could break.

Could you provide more information on how your cluster is setup? Do you use firewalls, load balancers for Oozie, and/or TLS for Oozie?

In the meantime, you should be able to lower the 10 minute recheck interval on the Oozie server oozie-site.xml configuration via the key "oozie.service.ActionCheckerService.action.check.delay" (specified in seconds, its default value is 600 for 10 minutes).

avatar
New Contributor

Harsh,

 

I raised a support ticket and I was given the knowledge article which says about creating two groups for oozie since HA was enabled. The solution did work. Thanks for your response.

avatar
Explorer

Hi

 

Can I get a link to the support article? Or can you let me know the fix that was suggested?

 

Thanks.

avatar
New Contributor

Hi Lal,

 

I finally found the link. Cloudera had actually changed the url and updated the article.

 

https://cloudera-portal.force.com/articles/KB_Article/After-enabling-Oozie-High-Availability-and-cha...

 

Regards,

Nithin

avatar
Explorer

Hi,

I have same issues, can you describe .

What exactly did you do for fix this issue (you link to support ticket does't open)

avatar
Explorer

It's a bug in Oozie. CoordActionCheckXCommand doesn't take care of SUSPENDED state. It only handles SUCCEEDED, FAILED and KILLED.

protected Void execute() throws CommandException {
        try {
            InstrumentUtils.incrJobCounter(getName(), 1, getInstrumentation());
            Status slaStatus = null;
            CoordinatorAction.Status initialStatus = coordAction.getStatus();

            if (workflowJob.getStatus() == WorkflowJob.Status.SUCCEEDED) {
                coordAction.setStatus(CoordinatorAction.Status.SUCCEEDED);
                // set pending to false as the status is SUCCEEDED                coordAction.setPending(0);
                slaStatus = Status.SUCCEEDED;
            }
            else {
                if (workflowJob.getStatus() == WorkflowJob.Status.FAILED) {
                    coordAction.setStatus(CoordinatorAction.Status.FAILED);
                    slaStatus = Status.FAILED;
                    // set pending to false as the status is FAILED                    coordAction.setPending(0);
                }
                else {
                    if (workflowJob.getStatus() == WorkflowJob.Status.KILLED) {
                        coordAction.setStatus(CoordinatorAction.Status.KILLED);
                        slaStatus = Status.KILLED;
                        // set pending to false as the status is KILLED                        coordAction.setPending(0);
                    }
                    else {
                        LOG.warn("Unexpected workflow " + workflowJob.getId() + " STATUS " + workflowJob.getStatus());
                        coordAction.setLastModifiedTime(new Date());
                        CoordActionQueryExecutor.getInstance().executeUpdate(
                                CoordActionQueryExecutor.CoordActionQuery.UPDATE_COORD_ACTION_FOR_MODIFIED_DATE,
                                coordAction);
                        return null;
                    }
                }
            }