Reply
New Contributor
Posts: 2
Registered: ‎08-15-2013

Oozie server is stuck in OOME

[ Edited ]

Hello,

 

We are using Oozie to run workflows with single task of hive query.

We can see now at the past few days that the Oozie server is stuck and checking log files I see the oozie coordinator go OOME querying the derby DB.

 

The Oozie java heap size is 653.69 MiB 

We have less than 1000 jobs at the queue - the majority are done (successfully or killed)

(oozie jobs -len 1000 | wc -l)

Is this a high number? Do we need to perform some cleanup for old jobs?

 

Next is snippet of the oozie-cmf-oozie1-OOZIE_SERVER-va-p-mdtcdh-01-c.private.mtlink.biz.log.out log file.

 

 

2013-08-15 08:00:17,945 WARN openjpa.Enhance: Creating subclass for "[class org.apache.oozie.util.db.ValidateConnectionBean]". This means that your application will be less efficient and will consume more memory than it would if you ran the OpenJPA enhancer. Additionally, lazy loading will not be available for one-to-one and many-to-one persistent attributes in types using field access; they will be loaded eagerly instead.
2013-08-15 08:00:28,533 WARN org.apache.oozie.service.JPAService: USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] JPAExecutor [WorkflowsJobGetJPAExecutor] ended with an active transaction, rolling back
2013-08-15 08:00:28,533 ERROR org.apache.oozie.command.wf.JobsXCommand: USER[hue] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] XException,
org.apache.oozie.command.CommandException: E0603: SQL error in operation, Java exception: 'GC overhead limit exceeded: java.lang.OutOfMemoryError'.
        at org.apache.oozie.command.wf.JobsXCommand.execute(JobsXCommand.java:72)
        at org.apache.oozie.command.wf.JobsXCommand.execute(JobsXCommand.java:32)
        at org.apache.oozie.command.XCommand.call(XCommand.java:277)
        at org.apache.oozie.DagEngine.getJobs(DagEngine.java:443)
        at org.apache.oozie.servlet.V1JobsServlet.getWorkflowJobs(V1JobsServlet.java:323)
        at org.apache.oozie.servlet.V1JobsServlet.getJobs(V1JobsServlet.java:150)
        at org.apache.oozie.servlet.BaseJobsServlet.doGet(BaseJobsServlet.java:121)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
        at org.apache.oozie.servlet.JsonRestServlet.service(JsonRestServlet.java:286)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.oozie.servlet.AuthFilter$2.doFilter(AuthFilter.java:126)
        at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:384)
        at org.apache.oozie.servlet.AuthFilter.doFilter(AuthFilter.java:131)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.oozie.servlet.HostnameFilter.doFilter(HostnameFilter.java:84)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
        at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
        at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
        at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
        at java.lang.Thread.run(Thread.java:722)
Caused by: <openjpa-2.1.0-r422266:1071316 fatal general error> org.apache.openjpa.persistence.PersistenceException: Java exception: 'GC overhead limit exceeded: java.lang.OutOfMemoryError'.
        at org.apache.openjpa.jdbc.sql.DBDictionary.narrow(DBDictionary.java:4869)
        at org.apache.openjpa.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4829)
        at org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:136)
        at org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:118)
        at org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:70)
        at org.apache.openjpa.jdbc.kernel.SelectResultObjectProvider.handleCheckedException(SelectResultObjectProvider.java:155)
        at org.apache.openjpa.lib.rop.RangeResultObjectProvider.handleCheckedException(RangeResultObjectProvider.java:130)
        at org.apache.openjpa.kernel.QueryImpl$PackingResultObjectProvider.handleCheckedException(QueryImpl.java:2111)
        at org.apache.oozie.service.JPAService.execute(JPAService.java:211)
        at org.apache.oozie.command.wf.JobsXCommand.execute(JobsXCommand.java:61)
        ... 29 more
Caused by: java.sql.SQLException: Java exception: 'GC overhead limit exceeded: java.lang.OutOfMemoryError'.
        at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.Util.javaException(Unknown Source)
        at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
        at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedResultSet.closeOnTransactionError(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedResultSet.next(Unknown Source)
        at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
        at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
        at org.apache.openjpa.lib.jdbc.DelegatingResultSet.next(DelegatingResultSet.java:131)
        at org.apache.openjpa.jdbc.sql.ResultSetResult.nextInternal(ResultSetResult.java:222)
        at org.apache.openjpa.jdbc.sql.SelectImpl$SelectResult.nextInternal(SelectImpl.java:2445)
        at org.apache.openjpa.jdbc.sql.AbstractResult.next(AbstractResult.java:175)
        at org.apache.openjpa.jdbc.kernel.SelectResultObjectProvider.next(SelectResultObjectProvider.java:99)
        at org.apache.openjpa.lib.rop.RangeResultObjectProvider.next(RangeResultObjectProvider.java:102)
        at org.apache.openjpa.kernel.QueryImpl$PackingResultObjectProvider.next(QueryImpl.java:2087)
        at org.apache.openjpa.lib.rop.WindowResultList.getInternal(WindowResultList.java:129)
        ... 36 more
Caused by: java.sql.SQLException: Java exception: 'GC overhead limit exceeded: java.lang.OutOfMemoryError'.
        at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)
        ... 56 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at org.apache.derby.iapi.types.SQLChar.<init>(Unknown Source)
        at org.apache.derby.iapi.types.SQLVarchar.<init>(Unknown Source)
        at org.apache.derby.iapi.types.SQLVarchar.cloneValue(Unknown Source)
        at org.apache.derby.iapi.store.access.BackingStoreHashtable.cloneRow(Unknown Source)
        at org.apache.derby.iapi.store.access.BackingStoreHashtable.add_row_to_hash_table(Unknown Source)
        at org.apache.derby.iapi.store.access.BackingStoreHashtable.putRow(Unknown Source)
        at org.apache.derby.impl.sql.execute.ScrollInsensitiveResultSet.addRowToHashTable(Unknown Source)
        at org.apache.derby.impl.sql.execute.ScrollInsensitiveResultSet.getNextRowFromSource(Unknown Source)
        at org.apache.derby.impl.sql.execute.ScrollInsensitiveResultSet.getNextRowCore(Unknown Source)
        at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedResultSet.next(Unknown Source)
        at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
        at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
        at org.apache.openjpa.lib.jdbc.DelegatingResultSet.next(DelegatingResultSet.java:131)
        at org.apache.openjpa.jdbc.sql.ResultSetResult.nextInternal(ResultSetResult.java:222)
        at org.apache.openjpa.jdbc.sql.SelectImpl$SelectResult.nextInternal(SelectImpl.java:2445)
        at org.apache.openjpa.jdbc.sql.AbstractResult.next(AbstractResult.java:175)
        at org.apache.openjpa.jdbc.kernel.SelectResultObjectProvider.next(SelectResultObjectProvider.java:99)
        at org.apache.openjpa.lib.rop.RangeResultObjectProvider.next(RangeResultObjectProvider.java:102)
        at org.apache.openjpa.kernel.QueryImpl$PackingResultObjectProvider.next(QueryImpl.java:2087)
        at org.apache.openjpa.lib.rop.WindowResultList.getInternal(WindowResultList.java:129)
        at org.apache.openjpa.lib.rop.AbstractNonSequentialResultList$Itr.hasNext(AbstractNonSequentialResultList.java:171)
        at org.apache.openjpa.lib.rop.ResultListIterator.hasNext(ResultListIterator.java:53)
        at org.apache.openjpa.kernel.DelegatingResultList$DelegatingListIterator.hasNext(DelegatingResultList.java:389)
        at org.apache.oozie.executor.jpa.WorkflowsJobGetJPAExecutor.execute(WorkflowsJobGetJPAExecutor.java:251)
        at org.apache.oozie.executor.jpa.WorkflowsJobGetJPAExecutor.execute(WorkflowsJobGetJPAExecutor.java:40)
        at org.apache.oozie.service.JPAService.execute(JPAService.java:211)
        at org.apache.oozie.command.wf.JobsXCommand.execute(JobsXCommand.java:61)
        at org.apache.oozie.command.wf.JobsXCommand.execute(JobsXCommand.java:32)
        at org.apache.oozie.command.XCommand.call(XCommand.java:277)
        at org.apache.oozie.DagEngine.getJobs(DagEngine.java:443)

 

 

Regards,

   Ronen Shachar

Cloudera Employee
Posts: 35
Registered: ‎07-08-2013

Re: Oozie server is stuck in OOME

Hi,

 

Oozie will purge old workflows from the database; IIRC its after 30 days.  Also, older versions of Oozie have some minor bugs with the purging logic.  

 

In any case, using Derby is only for developerment and should not be used in a production cluster (or really even in a test cluster).  It is recommended that you use one of the other databases: mysql, postgres, or Oracle.  I think the heap size you're using may also be a bit low; try 1GB or 2GB instead.  

Software Engineer | Cloudera, Inc. | http://cloudera.com
Announcements