Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Oozie server is stuck in OOME

Oozie server is stuck in OOME

New Contributor

Hello,

 

We are using Oozie to run workflows with single task of hive query.

We can see now at the past few days that the Oozie server is stuck and checking log files I see the oozie coordinator go OOME querying the derby DB.

 

The Oozie java heap size is 653.69 MiB 

We have less than 1000 jobs at the queue - the majority are done (successfully or killed)

(oozie jobs -len 1000 | wc -l)

Is this a high number? Do we need to perform some cleanup for old jobs?

 

Next is snippet of the oozie-cmf-oozie1-OOZIE_SERVER-va-p-mdtcdh-01-c.private.mtlink.biz.log.out log file.

 

 

2013-08-15 08:00:17,945 WARN openjpa.Enhance: Creating subclass for "[class org.apache.oozie.util.db.ValidateConnectionBean]". This means that your application will be less efficient and will consume more memory than it would if you ran the OpenJPA enhancer. Additionally, lazy loading will not be available for one-to-one and many-to-one persistent attributes in types using field access; they will be loaded eagerly instead.
2013-08-15 08:00:28,533 WARN org.apache.oozie.service.JPAService: USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] JPAExecutor [WorkflowsJobGetJPAExecutor] ended with an active transaction, rolling back
2013-08-15 08:00:28,533 ERROR org.apache.oozie.command.wf.JobsXCommand: USER[hue] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] XException,
org.apache.oozie.command.CommandException: E0603: SQL error in operation, Java exception: 'GC overhead limit exceeded: java.lang.OutOfMemoryError'.
        at org.apache.oozie.command.wf.JobsXCommand.execute(JobsXCommand.java:72)
        at org.apache.oozie.command.wf.JobsXCommand.execute(JobsXCommand.java:32)
        at org.apache.oozie.command.XCommand.call(XCommand.java:277)
        at org.apache.oozie.DagEngine.getJobs(DagEngine.java:443)
        at org.apache.oozie.servlet.V1JobsServlet.getWorkflowJobs(V1JobsServlet.java:323)
        at org.apache.oozie.servlet.V1JobsServlet.getJobs(V1JobsServlet.java:150)
        at org.apache.oozie.servlet.BaseJobsServlet.doGet(BaseJobsServlet.java:121)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
        at org.apache.oozie.servlet.JsonRestServlet.service(JsonRestServlet.java:286)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.oozie.servlet.AuthFilter$2.doFilter(AuthFilter.java:126)
        at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:384)
        at org.apache.oozie.servlet.AuthFilter.doFilter(AuthFilter.java:131)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.oozie.servlet.HostnameFilter.doFilter(HostnameFilter.java:84)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
        at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
        at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
        at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
        at java.lang.Thread.run(Thread.java:722)
Caused by: <openjpa-2.1.0-r422266:1071316 fatal general error> org.apache.openjpa.persistence.PersistenceException: Java exception: 'GC overhead limit exceeded: java.lang.OutOfMemoryError'.
        at org.apache.openjpa.jdbc.sql.DBDictionary.narrow(DBDictionary.java:4869)
        at org.apache.openjpa.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4829)
        at org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:136)
        at org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:118)
        at org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:70)
        at org.apache.openjpa.jdbc.kernel.SelectResultObjectProvider.handleCheckedException(SelectResultObjectProvider.java:155)
        at org.apache.openjpa.lib.rop.RangeResultObjectProvider.handleCheckedException(RangeResultObjectProvider.java:130)
        at org.apache.openjpa.kernel.QueryImpl$PackingResultObjectProvider.handleCheckedException(QueryImpl.java:2111)
        at org.apache.oozie.service.JPAService.execute(JPAService.java:211)
        at org.apache.oozie.command.wf.JobsXCommand.execute(JobsXCommand.java:61)
        ... 29 more
Caused by: java.sql.SQLException: Java exception: 'GC overhead limit exceeded: java.lang.OutOfMemoryError'.
        at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.Util.javaException(Unknown Source)
        at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
        at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedResultSet.closeOnTransactionError(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedResultSet.next(Unknown Source)
        at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
        at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
        at org.apache.openjpa.lib.jdbc.DelegatingResultSet.next(DelegatingResultSet.java:131)
        at org.apache.openjpa.jdbc.sql.ResultSetResult.nextInternal(ResultSetResult.java:222)
        at org.apache.openjpa.jdbc.sql.SelectImpl$SelectResult.nextInternal(SelectImpl.java:2445)
        at org.apache.openjpa.jdbc.sql.AbstractResult.next(AbstractResult.java:175)
        at org.apache.openjpa.jdbc.kernel.SelectResultObjectProvider.next(SelectResultObjectProvider.java:99)
        at org.apache.openjpa.lib.rop.RangeResultObjectProvider.next(RangeResultObjectProvider.java:102)
        at org.apache.openjpa.kernel.QueryImpl$PackingResultObjectProvider.next(QueryImpl.java:2087)
        at org.apache.openjpa.lib.rop.WindowResultList.getInternal(WindowResultList.java:129)
        ... 36 more
Caused by: java.sql.SQLException: Java exception: 'GC overhead limit exceeded: java.lang.OutOfMemoryError'.
        at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)
        ... 56 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at org.apache.derby.iapi.types.SQLChar.<init>(Unknown Source)
        at org.apache.derby.iapi.types.SQLVarchar.<init>(Unknown Source)
        at org.apache.derby.iapi.types.SQLVarchar.cloneValue(Unknown Source)
        at org.apache.derby.iapi.store.access.BackingStoreHashtable.cloneRow(Unknown Source)
        at org.apache.derby.iapi.store.access.BackingStoreHashtable.add_row_to_hash_table(Unknown Source)
        at org.apache.derby.iapi.store.access.BackingStoreHashtable.putRow(Unknown Source)
        at org.apache.derby.impl.sql.execute.ScrollInsensitiveResultSet.addRowToHashTable(Unknown Source)
        at org.apache.derby.impl.sql.execute.ScrollInsensitiveResultSet.getNextRowFromSource(Unknown Source)
        at org.apache.derby.impl.sql.execute.ScrollInsensitiveResultSet.getNextRowCore(Unknown Source)
        at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedResultSet.next(Unknown Source)
        at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
        at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
        at org.apache.openjpa.lib.jdbc.DelegatingResultSet.next(DelegatingResultSet.java:131)
        at org.apache.openjpa.jdbc.sql.ResultSetResult.nextInternal(ResultSetResult.java:222)
        at org.apache.openjpa.jdbc.sql.SelectImpl$SelectResult.nextInternal(SelectImpl.java:2445)
        at org.apache.openjpa.jdbc.sql.AbstractResult.next(AbstractResult.java:175)
        at org.apache.openjpa.jdbc.kernel.SelectResultObjectProvider.next(SelectResultObjectProvider.java:99)
        at org.apache.openjpa.lib.rop.RangeResultObjectProvider.next(RangeResultObjectProvider.java:102)
        at org.apache.openjpa.kernel.QueryImpl$PackingResultObjectProvider.next(QueryImpl.java:2087)
        at org.apache.openjpa.lib.rop.WindowResultList.getInternal(WindowResultList.java:129)
        at org.apache.openjpa.lib.rop.AbstractNonSequentialResultList$Itr.hasNext(AbstractNonSequentialResultList.java:171)
        at org.apache.openjpa.lib.rop.ResultListIterator.hasNext(ResultListIterator.java:53)
        at org.apache.openjpa.kernel.DelegatingResultList$DelegatingListIterator.hasNext(DelegatingResultList.java:389)
        at org.apache.oozie.executor.jpa.WorkflowsJobGetJPAExecutor.execute(WorkflowsJobGetJPAExecutor.java:251)
        at org.apache.oozie.executor.jpa.WorkflowsJobGetJPAExecutor.execute(WorkflowsJobGetJPAExecutor.java:40)
        at org.apache.oozie.service.JPAService.execute(JPAService.java:211)
        at org.apache.oozie.command.wf.JobsXCommand.execute(JobsXCommand.java:61)
        at org.apache.oozie.command.wf.JobsXCommand.execute(JobsXCommand.java:32)
        at org.apache.oozie.command.XCommand.call(XCommand.java:277)
        at org.apache.oozie.DagEngine.getJobs(DagEngine.java:443)

 

 

Regards,

   Ronen Shachar

1 REPLY 1

Re: Oozie server is stuck in OOME

Contributor

Hi,

 

Oozie will purge old workflows from the database; IIRC its after 30 days.  Also, older versions of Oozie have some minor bugs with the purging logic.  

 

In any case, using Derby is only for developerment and should not be used in a production cluster (or really even in a test cluster).  It is recommended that you use one of the other databases: mysql, postgres, or Oracle.  I think the heap size you're using may also be a bit low; try 1GB or 2GB instead.  

Software Engineer | Cloudera, Inc. | http://cloudera.com