Created on 11-10-2014 11:54 AM - edited 09-16-2022 02:12 AM
Description:
We are getting an error while executing a Hive query against a table which has about 2.68 billion records and 430 columns. This table is partitioned by a column 'RDATE'.
NOTE: This is a new table and a new query. It was never executed earlier. This is the first time we are executing this query.
$hive
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.10.0-cdh4.4.0.jar!/hive-log4j.properties
Hive history file=/tmp/rajendrap/hive_job_log_21d3abad-7cdd-4268-b961-b752efc827a4_845263541.txt
hive> set hive.mapred.mode=nonstrict;
hive> set mapred.child.java.opts=-Xmx8g;
hive> INSERT OVERWRITE LOCAL DIRECTORY '/namenode/home/rajendrap/temp456'
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> SELECT rdate,count(*) FROM PAX_MV_HIST_PREV_CURR group by rdate order by rdate;
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:2882)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuilder.append(StringBuilder.java:119)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDDLFromFieldSchema(MetaStoreUtils.java:498)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getSchema(MetaStoreUtils.java:711)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getPartitionMetadata(MetaStoreUtils.java:515)
at org.apache.hadoop.hive.ql.metadata.Partition.getMetadataFromPartitionSchema(Partition.java:280)
at org.apache.hadoop.hive.ql.plan.PartitionDesc.<init>(PartitionDesc.java:90)
at org.apache.hadoop.hive.ql.exec.Utilities.getPartitionDesc(Utilities.java:683)
at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:826)
at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:597)
at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.initPlan(GenMapRedUtils.java:129)
at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink1.process(GenMRRedSink1.java:77)
at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:55)
at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:7883)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8265)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:459)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:349)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:938)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
We set the parameter mapred.child.java.opts to -Xmx8g ( we tried different setting 2 G, 4 G , 8G , 16G) . But every time we got the same error.
Please advise any possible solution/tips/recommendation for the above query to run successful.
Created 11-12-2014 01:53 AM
please don't use "order by" clause in your query.. you can order the data separately when you try to select the data from the HDFS location..
Created 12-09-2014 02:25 PM
Looks like you need to increase the memory of the HS2 process itself. That flag you mentioned will only affect the MR jobs that is spawned by Hive, but the stack indicates that it didn't make it past the compiler.
Hope that helps,
Szehon