Java heap space:Hive

Sru111 — Fri, 18 Dec 2020 09:27:14 GMT

I have a set of tables (20 tables) which fetches the last loaded date and count of the tables.

INSERT OVERWRITE TABLE dbo.table PARTITION (last_load_date ) select 'table' as tablename,current_date,count(*) as count,last_load_date from table1 where last_load_date in (select max(last_load_date) from table1) group by last_load_date union select 'table' as tablename,current_date,count(*) as count,last_load_date from table2 where last_load_date in (select max(last_load_date) from table2) group by last_load_date . . . Union of 20 TABLES

and configured the script like below and ran the script with command sh table.sh

Table.sh hive --hiveconf tez.queue.name=Last_date --hiveconf hive.session.id=data_xxx -f /mypath/union_query.sql

I could see the error:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3664) at java.lang.String.<init>(String.java:207) at java.lang.StringBuilder.toString(StringBuilder.java:407) at java.lang.String.valueOf(String.java:2994) at java.lang.StringBuilder.append(StringBuilder.java:131) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDDLFromFieldSchema(MetaStoreUtils.java:876) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getSchema(MetaStoreUtils.java:1091) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getPartitionMetadata(MetaStoreUtils.java:890) at org.apache.hadoop.hive.ql.metadata.Partition.getMetadataFromPartitionSchema(Partition.java:263) at org.apache.hadoop.hive.ql.plan.PartitionDesc.<init>(PartitionDesc.java:87) at org.apache.hadoop.hive.ql.exec.Utilities.getPartitionDesc(Utilities.java:1373) at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setMapWork(GenMapRedUtils.java:684) at org.apache.hadoop.hive.ql.parse.GenTezUtils.setupMapWork(GenTezUtils.java:212) at org.apache.hadoop.hive.ql.parse.GenTezUtils.createMapWork(GenTezUtils.java:195) at org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:131) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:90) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:72) at org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:205) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10598) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:474) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:330) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1233) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1274) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1170)

Could you please let me know how to optimize the query to avoid the above error as I use only union of tables. I tried to find the solution but I didn't find it in my case.

Thank you in advance for the reply

Re: Java heap space:Hive

kuldeephawks — Fri, 18 Dec 2020 13:57:48 GMT

why don't you write 20 insert into query in append mode in the same sql file ';' separated, In this case you don't have to use union.

And probably won't encounter heap space issue.

question Java heap space:Hive in Support Questions

Java heap space:Hive

Re: Java heap space:Hive