Created on 12-18-2020 01:25 AM - edited 12-18-2020 01:27 AM
I have a set of tables (20 tables) which fetches the last loaded date and count of the tables.
INSERT OVERWRITE TABLE dbo.table PARTITION (last_load_date )
select 'table' as tablename,current_date,count(*) as count,last_load_date from table1 where last_load_date in (select max(last_load_date) from table1)
group by last_load_date
union
select 'table' as tablename,current_date,count(*) as count,last_load_date from table2 where last_load_date in (select max(last_load_date) from table2)
group by last_load_date
.
.
.
Union of 20 TABLES
and configured the script like below and ran the script with command sh table.sh
Table.sh
hive --hiveconf tez.queue.name=Last_date --hiveconf hive.session.id=data_xxx -f /mypath/union_query.sql
I could see the error:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDDLFromFieldSchema(MetaStoreUtils.java:876)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getSchema(MetaStoreUtils.java:1091)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getPartitionMetadata(MetaStoreUtils.java:890)
at org.apache.hadoop.hive.ql.metadata.Partition.getMetadataFromPartitionSchema(Partition.java:263)
at org.apache.hadoop.hive.ql.plan.PartitionDesc.<init>(PartitionDesc.java:87)
at org.apache.hadoop.hive.ql.exec.Utilities.getPartitionDesc(Utilities.java:1373)
at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setMapWork(GenMapRedUtils.java:684)
at org.apache.hadoop.hive.ql.parse.GenTezUtils.setupMapWork(GenTezUtils.java:212)
at org.apache.hadoop.hive.ql.parse.GenTezUtils.createMapWork(GenTezUtils.java:195)
at org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:131)
at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:90)
at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:72)
at org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368)
at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:205)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10598)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:474)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:330)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1233)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1274)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1170)
Could you please let me know how to optimize the query to avoid the above error as I use only union of tables. I tried to find the solution but I didn't find it in my case.
Thank you in advance for the reply
Created 12-18-2020 05:57 AM
why don't you write 20 insert into query in append mode in the same sql file ';' separated, In this case you don't have to use union.
And probably won't encounter heap space issue.
Created 12-18-2020 05:57 AM
why don't you write 20 insert into query in append mode in the same sql file ';' separated, In this case you don't have to use union.
And probably won't encounter heap space issue.