Support Questions

rajendrapalikala · ‎11-10-2014

Description:

We are getting an error while executing a Hive query against a table which has about 2.68 billion records and 430 columns. This table is partitioned by a column 'RDATE'.

NOTE: This is a new table and a new query. It was never executed earlier. This is the first time we are executing this query.

$hive
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.10.0-cdh4.4.0.jar!/hive-log4j.properties
Hive history file=/tmp/rajendrap/hive_job_log_21d3abad-7cdd-4268-b961-b752efc827a4_845263541.txt
hive> set hive.mapred.mode=nonstrict;
hive> set mapred.child.java.opts=-Xmx8g;
hive> INSERT OVERWRITE LOCAL DIRECTORY '/namenode/home/rajendrap/temp456'
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> SELECT rdate,count(*) FROM PAX_MV_HIST_PREV_CURR group by rdate order by rdate;
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:2882)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuilder.append(StringBuilder.java:119)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDDLFromFieldSchema(MetaStoreUtils.java:498)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getSchema(MetaStoreUtils.java:711)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getPartitionMetadata(MetaStoreUtils.java:515)
at org.apache.hadoop.hive.ql.metadata.Partition.getMetadataFromPartitionSchema(Partition.java:280)
at org.apache.hadoop.hive.ql.plan.PartitionDesc.<init>(PartitionDesc.java:90)
at org.apache.hadoop.hive.ql.exec.Utilities.getPartitionDesc(Utilities.java:683)
at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:826)
at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:597)
at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.initPlan(GenMapRedUtils.java:129)
at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink1.process(GenMRRedSink1.java:77)
at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:55)
at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:7883)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8265)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:459)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:349)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:938)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)

We set the parameter mapred.child.java.opts to -Xmx8g ( we tried different setting 2 G, 4 G , 8G , 16G) . But every time we got the same error.

Please advise any possible solution/tips/recommendation for the above query to run successful.

Ajay Vaidyanathan · ‎11-12-2014

please don't use "order by" clause in your query.. you can order the data separately when you try to select the data from the HDFS location..

szehon · ‎12-09-2014

Looks like you need to increase the memory of the HS2 process itself. That flag you mentioned will only affect the MR jobs that is spawned by Hive, but the stack indicates that it didn't make it past the compiler.

Hope that helps,

Szehon

Cloudera Community

Support Questions

Error while running Hive query - java.lang.OutOfMemoryError - GC overhead limit exceeded