Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Error while running Hive query - java.lang.OutOfMemoryError - GC overhead limit exceeded

avatar
New Contributor

Description:

We are getting an error while executing a Hive query against a table which has about 2.68 billion records and 430 columns. This table is partitioned by a column 'RDATE'.

NOTE: This is a new table and a new query. It was never executed earlier. This is the first time we are executing this query.

$hive
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.10.0-cdh4.4.0.jar!/hive-log4j.properties
Hive history file=/tmp/rajendrap/hive_job_log_21d3abad-7cdd-4268-b961-b752efc827a4_845263541.txt
hive> set hive.mapred.mode=nonstrict;
hive> set mapred.child.java.opts=-Xmx8g;
hive> INSERT OVERWRITE LOCAL DIRECTORY '/namenode/home/rajendrap/temp456'
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> SELECT rdate,count(*) FROM PAX_MV_HIST_PREV_CURR group by rdate order by rdate;
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:2882)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuilder.append(StringBuilder.java:119)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDDLFromFieldSchema(MetaStoreUtils.java:498)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getSchema(MetaStoreUtils.java:711)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getPartitionMetadata(MetaStoreUtils.java:515)
at org.apache.hadoop.hive.ql.metadata.Partition.getMetadataFromPartitionSchema(Partition.java:280)
at org.apache.hadoop.hive.ql.plan.PartitionDesc.<init>(PartitionDesc.java:90)
at org.apache.hadoop.hive.ql.exec.Utilities.getPartitionDesc(Utilities.java:683)
at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:826)
at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:597)
at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.initPlan(GenMapRedUtils.java:129)
at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink1.process(GenMRRedSink1.java:77)
at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:55)
at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:7883)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8265)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:459)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:349)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:938)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)


We set the parameter mapred.child.java.opts to -Xmx8g ( we tried different setting 2 G, 4 G , 8G , 16G) . But every time we got the same error.

Please advise any possible solution/tips/recommendation for the above query to run successful.

2 REPLIES 2

avatar
New Contributor

please don't use "order by" clause in your query.. you can order the data separately when you try to select the data from the HDFS location..

avatar
Rising Star

Looks like you need to increase the memory of the HS2 process itself.  That flag you mentioned will only affect the MR jobs that is spawned by Hive, but the stack indicates that it didn't make it past the compiler.

 

Hope that helps,

Szehon