- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive query failed with org.apache.hadoop.hive.ql.exec.tez.TezTask java.lang.OutOfMemoryError: Java heap space
- Labels:
-
Apache Ambari
-
Apache Hive
-
Apache Tez
Created ‎06-09-2020 08:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a Hive Table in ORC format. I used Zeppelin to query the table with jdbc(hive)
SELECT `timestamp`, url FROM events where id='f9e43fc7b' ORDER BY `timestamp` DESC
However, the query runs into below error:
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1591402457216_0009_2_01, diagnostics=[Vertex vertex_1591402457216_0009_2_01 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: evt initializer failed, vertex=vertex_1591402457216_0009_2_01 [Map 1], java.lang.OutOfMemoryError: Java heap space
at java.util.regex.Matcher.<init>(Matcher.java:225)
at java.util.regex.Pattern.matcher(Pattern.java:1093)
at org.apache.hadoop.hive.ql.io.AcidUtils$BucketMetaData.parse(AcidUtils.java:318)
at org.apache.hadoop.hive.ql.io.AcidUtils$BucketMetaData.parse(AcidUtils.java:332)
at org.apache.hadoop.hive.ql.io.AcidUtils.parseBucketId(AcidUtils.java:367)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.determineSplitStrategy(OrcInputFormat.java:2331)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.determineSplitStrategies(OrcInputFormat.java:2306)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1811)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1939)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:522)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:777)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1591402457216_0009_2_02, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1591402457216_0009_2_02 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:401)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:266)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:718)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:801)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:633)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Any idea regarding this error? Any resource configuration I should look into?
Created on ‎06-10-2020 03:30 AM - edited ‎06-10-2020 06:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FYI my table is partition by Year + Month + Day.
Total file size in HDFS is 10TB.
Total records is 21 Billion records.
We have 8 data nodes in HDP.
Created ‎06-10-2020 03:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@ShuwnYuan Thats a pretty big query. If you do not have enough memory in yarn available to the containers building the query it will fail with this error. You are going to need to increase the tez container size in Hive Configuration.
Created ‎06-10-2020 06:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@stevenmatison Thanks for prompt reply.
I'm using HDP-3.0.1.0 with Ambari. Here's my current Hive config:
Tez Container Size: 3072 MB
HiveServer2 Heap Size: 4096 MB
Memory: 819.2 MB
Data per Reducer: 2042.9 MB
They are mostly the default values.
Do they make sense?
Any suggestion on which to increase / decrease for optimum performance?
