Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Oozie server fails to start on Cloudera Express 5.6.0

avatar
Explorer

Hi All,

After starting up a Cluster that has been shutdown for some time I find that one of the Oozie servers will not stay up.  It initially starts then goes down.  There are two errors reported in the log file.  I assume that one of them is responsible but I'm not sure which.  Any help would be appreciated.  The errors are:

First Error:
Source: PurgeXCommand
Message: <openjpa-2.2.2-r422266:1468616 fatal general error> org.apache.openjpa.persistence.PersistenceException: The column index is out of range: 3, number of columns: 2.
FailedObject: select w.id, w.parentId from WorkflowJobBean w where w.endTimestamp < :endTime and w.parentId like '%C@%' [java.lang.String]

Second Error:
Source: CoordStatusTransitXCommand
Message: SERVER[<redacted>] USER[<redacted>] GROUP[-] TOKEN[] APP[<redacted>] JOB[0000068-201217031325641-oozie-oozi-C] ACTION[-] XException,
org.apache.oozie.command.CommandException: E0606: Could not get lock [coord_status_transit_b60ff85c-fe55-4bf5-8003-ee73935ca076], timed out [0]ms

Any guidance would be appreciated.

Best regards,

Ian.

1 ACCEPTED SOLUTION

avatar
Explorer

Hello @Scharan,

I resolved this by setting a number of oozie jobs to a status of "FAILED" directly in the oozie database.  They were rogue jobs at a status of RUNNING in the database.

Thanks for your input and replies!

Ian.

View solution in original post

5 REPLIES 5

avatar
Master Collaborator

Hello @idodds 

Can you try setting oozie.wf.validate.ForkJoin=false in the oozie-site 

 

avatar
Explorer

Hello @Scharan,

thanks for the advice.  I've tried setting that property in the jobs.properties file.  After the service restart I'm no longer getting the E0606.  However, the PurgeXCommand error remains and the service still fails to stay up.

I'm beginning to suspect that I have an incorrect component version installed as it looks like PurgeXCommand is the issue.  The error is being raised by openjpa-2.2.2-r422266.  I'm wondering if this version is incompatible with Cloudera Express 5.6.0.

avatar
Master Collaborator

Hello @idodds 

It seems to be a known issue. OPENJPA-2482, Can you try replacing OpenJPA 2.2.2 jars under /opt/cloudera/parcels/CDH/jars/ with the 2.4.2 version? You can download using the below command.

wget http://www-eu.apache.org/dist/openjpa/2.4.2/apache-openjpa-2.4.2-binary.zip

I suggest you take a backup before replacing the jars.

 

avatar
Explorer

Hello @Scharan , thanks again for the lead.  I'll look into this and report back.

Thanks,

Ian.

avatar
Explorer

Hello @Scharan,

I resolved this by setting a number of oozie jobs to a status of "FAILED" directly in the oozie database.  They were rogue jobs at a status of RUNNING in the database.

Thanks for your input and replies!

Ian.