Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Hive: Union all and aggregation are failing with large parquet tables (150 col, 5 mil rows)

avatar
Frequent Visitor


I have following query with 2 parquet tables (t_par_string, t_par_datatype).

select count(*)
from (
select max(source) source,
col1, col2, col3
.
.
.
col149,col150 , count(*)
from (
select 1 source,
col1, col2, col3
.
.
.
col149,col150
from t_par_string
union all
select 1 source,
col1, col2, col3
.
.
.
col149,col150
from t_par_datatype
) merged_data
group by
col1, col2, col3
.
.
.
col149,col150
having count(*) = 1
) minus_data
where source = 1

It is failed with following error

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text org.apache.hadoop.hive.ql.udf.UDFToString.evaluate(org.apache.hadoop.hive.serde2.io.TimestampWritable) on object org.apache.hadoop.hive.ql.udf.UDFToString@134ff8f8 of class org.apache.hadoop.hive.ql.udf.UDFToString with arguments {2015-10-17 00:00:00:org.apache.hadoop.hive.serde2.io.TimestampWritable} of size 1
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:989)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
... 9 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:965)
... 18 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3181)
at java.text.DateFormatSymbols.copyMembers(DateFormatSymbols.java:850)
at java.text.DateFormatSymbols.initializeData(DateFormatSymbols.java:758)
at java.text.DateFormatSymbols.<init>(DateFormatSymbols.java:145)
at sun.util.locale.provider.DateFormatSymbolsProviderImpl.getInstance(DateFormatSymbolsProviderImpl.java:85)
at java.text.DateFormatSymbols.getProviderInstance(DateFormatSymbols.java:364)
at java.text.DateFormatSymbols.getInstance(DateFormatSymbols.java:340)
at java.util.Calendar.getDisplayName(Calendar.java:2110)
at java.text.SimpleDateFormat.subFormat(SimpleDateFormat.java:1125)
at java.text.SimpleDateFormat.format(SimpleDateFormat.java:966)
at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936)
at java.text.DateFormat.format(DateFormat.java:345)
at org.apache.hadoop.hive.serde2.io.TimestampWritable.toString(TimestampWritable.java:383)
at org.apache.hadoop.hive.ql.udf.UDFToString.evaluate(UDFToString.java:150)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:965)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)

1 ACCEPTED SOLUTION

avatar
Champion

Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

 

Increase the container and heap size.  I am not sure whether it is a mapper or reducer that is failing but here are the settings to look into.

 

set hive.exec.reducers.bytes.per.reducer=

set mapreduce.map.memory.mb=

set mapreduce.reduce.memory.mb=

set mapreduce.map.java.opts=<roughly 80% of container size>

set mapreduce.reduce.java.opts=<roughly 80% of container size>

View solution in original post

1 REPLY 1

avatar
Champion

Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

 

Increase the container and heap size.  I am not sure whether it is a mapper or reducer that is failing but here are the settings to look into.

 

set hive.exec.reducers.bytes.per.reducer=

set mapreduce.map.memory.mb=

set mapreduce.reduce.memory.mb=

set mapreduce.map.java.opts=<roughly 80% of container size>

set mapreduce.reduce.java.opts=<roughly 80% of container size>