Support Questions

Find answers, ask questions, and share your expertise

Hive query failed with org.apache.hadoop.hive.ql.exec.tez.TezTask java.lang.OutOfMemoryError: Java heap space

avatar
Explorer

I have a Hive Table in ORC format. I used Zeppelin to query the table with jdbc(hive)

SELECT `timestamp`, url FROM events where id='f9e43fc7b' ORDER BY `timestamp` DESC

 

However, the query runs into below error:

java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1591402457216_0009_2_01, diagnostics=[Vertex vertex_1591402457216_0009_2_01 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: evt initializer failed, vertex=vertex_1591402457216_0009_2_01 [Map 1], java.lang.OutOfMemoryError: Java heap space
	at java.util.regex.Matcher.<init>(Matcher.java:225)
	at java.util.regex.Pattern.matcher(Pattern.java:1093)
	at org.apache.hadoop.hive.ql.io.AcidUtils$BucketMetaData.parse(AcidUtils.java:318)
	at org.apache.hadoop.hive.ql.io.AcidUtils$BucketMetaData.parse(AcidUtils.java:332)
	at org.apache.hadoop.hive.ql.io.AcidUtils.parseBucketId(AcidUtils.java:367)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.determineSplitStrategy(OrcInputFormat.java:2331)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.determineSplitStrategies(OrcInputFormat.java:2306)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1811)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1939)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:522)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:777)
	at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1591402457216_0009_2_02, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1591402457216_0009_2_02 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
	at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:401)
	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:266)
	at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
	at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:718)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:801)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:633)
	at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
	at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

 

Any idea regarding this error? Any resource configuration I should look into?

3 REPLIES 3

avatar
Explorer

FYI my table is partition by Year + Month + Day.

Total file size in HDFS  is 10TB.

Total records is 21 Billion records.

We have 8 data nodes in HDP. 

avatar
Super Guru

@ShuwnYuan Thats a pretty big query.  If you do not have enough memory in yarn available to the containers building the query it will fail with this error.    You are going to need to increase the tez container size in Hive Configuration.

avatar
Explorer

@stevenmatison  Thanks for prompt reply.

 

I'm using HDP-3.0.1.0 with Ambari. Here's my current Hive config:

 

Tez Container Size: 3072 MB
HiveServer2 Heap Size: 4096 MB
Memory: 819.2 MB
Data per Reducer: 2042.9 MB

 

They are mostly the default values.

Do they make sense?

Any suggestion on which to increase / decrease for optimum performance?