<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Hive: Union all and aggregation are failing with large parquet tables (150 col, 5 mil rows) in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Union-all-and-aggregation-are-failing-with-large/m-p/54076#M60006</link>
    <description>&lt;P&gt;&lt;BR /&gt;I have following query with 2 parquet tables (t_par_string, t_par_datatype).&lt;/P&gt;&lt;P&gt;select count(*)&lt;BR /&gt;from (&lt;BR /&gt;select max(source) source,&lt;BR /&gt;col1, col2, col3&lt;BR /&gt;.&lt;BR /&gt;.&lt;BR /&gt;.&lt;BR /&gt;col149,col150 , count(*)&lt;BR /&gt;from (&lt;BR /&gt;select 1 source,&lt;BR /&gt;col1, col2, col3&lt;BR /&gt;.&lt;BR /&gt;.&lt;BR /&gt;.&lt;BR /&gt;col149,col150&lt;BR /&gt;from t_par_string&lt;BR /&gt;union all&lt;BR /&gt;select 1 source,&lt;BR /&gt;col1, col2, col3&lt;BR /&gt;.&lt;BR /&gt;.&lt;BR /&gt;.&lt;BR /&gt;col149,col150&lt;BR /&gt;from t_par_datatype&lt;BR /&gt;) merged_data&lt;BR /&gt;group by&lt;BR /&gt;col1, col2, col3&lt;BR /&gt;.&lt;BR /&gt;.&lt;BR /&gt;.&lt;BR /&gt;col149,col150&lt;BR /&gt;having count(*) = 1&lt;BR /&gt;) minus_data&lt;BR /&gt;where source = 1&lt;/P&gt;&lt;P&gt;It is failed with following error&lt;/P&gt;&lt;P&gt;Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)&lt;BR /&gt;... 8 more&lt;BR /&gt;Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text org.apache.hadoop.hive.ql.udf.UDFToString.evaluate(org.apache.hadoop.hive.serde2.io.TimestampWritable) on object org.apache.hadoop.hive.ql.udf.UDFToString@134ff8f8 of class org.apache.hadoop.hive.ql.udf.UDFToString with arguments {2015-10-17 00:00:00:org.apache.hadoop.hive.serde2.io.TimestampWritable} of size 1&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:989)&lt;BR /&gt;at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)&lt;BR /&gt;... 9 more&lt;BR /&gt;Caused by: java.lang.reflect.InvocationTargetException&lt;BR /&gt;at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)&lt;BR /&gt;at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)&lt;BR /&gt;at java.lang.reflect.Method.invoke(Method.java:498)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:965)&lt;BR /&gt;... 18 more&lt;BR /&gt;Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded&lt;BR /&gt;at java.util.Arrays.copyOf(Arrays.java:3181)&lt;BR /&gt;at java.text.DateFormatSymbols.copyMembers(DateFormatSymbols.java:850)&lt;BR /&gt;at java.text.DateFormatSymbols.initializeData(DateFormatSymbols.java:758)&lt;BR /&gt;at java.text.DateFormatSymbols.&amp;lt;init&amp;gt;(DateFormatSymbols.java:145)&lt;BR /&gt;at sun.util.locale.provider.DateFormatSymbolsProviderImpl.getInstance(DateFormatSymbolsProviderImpl.java:85)&lt;BR /&gt;at java.text.DateFormatSymbols.getProviderInstance(DateFormatSymbols.java:364)&lt;BR /&gt;at java.text.DateFormatSymbols.getInstance(DateFormatSymbols.java:340)&lt;BR /&gt;at java.util.Calendar.getDisplayName(Calendar.java:2110)&lt;BR /&gt;at java.text.SimpleDateFormat.subFormat(SimpleDateFormat.java:1125)&lt;BR /&gt;at java.text.SimpleDateFormat.format(SimpleDateFormat.java:966)&lt;BR /&gt;at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936)&lt;BR /&gt;at java.text.DateFormat.format(DateFormat.java:345)&lt;BR /&gt;at org.apache.hadoop.hive.serde2.io.TimestampWritable.toString(TimestampWritable.java:383)&lt;BR /&gt;at org.apache.hadoop.hive.ql.udf.UDFToString.evaluate(UDFToString.java:150)&lt;BR /&gt;at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)&lt;BR /&gt;at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)&lt;BR /&gt;at java.lang.reflect.Method.invoke(Method.java:498)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:965)&lt;BR /&gt;at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)&lt;BR /&gt;at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)&lt;BR /&gt;at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)&lt;BR /&gt;at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)&lt;BR /&gt;at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 11:30:30 GMT</pubDate>
    <dc:creator>PJ1982</dc:creator>
    <dc:date>2022-09-16T11:30:30Z</dc:date>
    <item>
      <title>Hive: Union all and aggregation are failing with large parquet tables (150 col, 5 mil rows)</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Union-all-and-aggregation-are-failing-with-large/m-p/54076#M60006</link>
      <description>&lt;P&gt;&lt;BR /&gt;I have following query with 2 parquet tables (t_par_string, t_par_datatype).&lt;/P&gt;&lt;P&gt;select count(*)&lt;BR /&gt;from (&lt;BR /&gt;select max(source) source,&lt;BR /&gt;col1, col2, col3&lt;BR /&gt;.&lt;BR /&gt;.&lt;BR /&gt;.&lt;BR /&gt;col149,col150 , count(*)&lt;BR /&gt;from (&lt;BR /&gt;select 1 source,&lt;BR /&gt;col1, col2, col3&lt;BR /&gt;.&lt;BR /&gt;.&lt;BR /&gt;.&lt;BR /&gt;col149,col150&lt;BR /&gt;from t_par_string&lt;BR /&gt;union all&lt;BR /&gt;select 1 source,&lt;BR /&gt;col1, col2, col3&lt;BR /&gt;.&lt;BR /&gt;.&lt;BR /&gt;.&lt;BR /&gt;col149,col150&lt;BR /&gt;from t_par_datatype&lt;BR /&gt;) merged_data&lt;BR /&gt;group by&lt;BR /&gt;col1, col2, col3&lt;BR /&gt;.&lt;BR /&gt;.&lt;BR /&gt;.&lt;BR /&gt;col149,col150&lt;BR /&gt;having count(*) = 1&lt;BR /&gt;) minus_data&lt;BR /&gt;where source = 1&lt;/P&gt;&lt;P&gt;It is failed with following error&lt;/P&gt;&lt;P&gt;Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)&lt;BR /&gt;... 8 more&lt;BR /&gt;Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text org.apache.hadoop.hive.ql.udf.UDFToString.evaluate(org.apache.hadoop.hive.serde2.io.TimestampWritable) on object org.apache.hadoop.hive.ql.udf.UDFToString@134ff8f8 of class org.apache.hadoop.hive.ql.udf.UDFToString with arguments {2015-10-17 00:00:00:org.apache.hadoop.hive.serde2.io.TimestampWritable} of size 1&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:989)&lt;BR /&gt;at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)&lt;BR /&gt;... 9 more&lt;BR /&gt;Caused by: java.lang.reflect.InvocationTargetException&lt;BR /&gt;at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)&lt;BR /&gt;at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)&lt;BR /&gt;at java.lang.reflect.Method.invoke(Method.java:498)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:965)&lt;BR /&gt;... 18 more&lt;BR /&gt;Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded&lt;BR /&gt;at java.util.Arrays.copyOf(Arrays.java:3181)&lt;BR /&gt;at java.text.DateFormatSymbols.copyMembers(DateFormatSymbols.java:850)&lt;BR /&gt;at java.text.DateFormatSymbols.initializeData(DateFormatSymbols.java:758)&lt;BR /&gt;at java.text.DateFormatSymbols.&amp;lt;init&amp;gt;(DateFormatSymbols.java:145)&lt;BR /&gt;at sun.util.locale.provider.DateFormatSymbolsProviderImpl.getInstance(DateFormatSymbolsProviderImpl.java:85)&lt;BR /&gt;at java.text.DateFormatSymbols.getProviderInstance(DateFormatSymbols.java:364)&lt;BR /&gt;at java.text.DateFormatSymbols.getInstance(DateFormatSymbols.java:340)&lt;BR /&gt;at java.util.Calendar.getDisplayName(Calendar.java:2110)&lt;BR /&gt;at java.text.SimpleDateFormat.subFormat(SimpleDateFormat.java:1125)&lt;BR /&gt;at java.text.SimpleDateFormat.format(SimpleDateFormat.java:966)&lt;BR /&gt;at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936)&lt;BR /&gt;at java.text.DateFormat.format(DateFormat.java:345)&lt;BR /&gt;at org.apache.hadoop.hive.serde2.io.TimestampWritable.toString(TimestampWritable.java:383)&lt;BR /&gt;at org.apache.hadoop.hive.ql.udf.UDFToString.evaluate(UDFToString.java:150)&lt;BR /&gt;at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)&lt;BR /&gt;at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)&lt;BR /&gt;at java.lang.reflect.Method.invoke(Method.java:498)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:965)&lt;BR /&gt;at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)&lt;BR /&gt;at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)&lt;BR /&gt;at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)&lt;BR /&gt;at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)&lt;BR /&gt;at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)&lt;BR /&gt;at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 11:30:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Union-all-and-aggregation-are-failing-with-large/m-p/54076#M60006</guid>
      <dc:creator>PJ1982</dc:creator>
      <dc:date>2022-09-16T11:30:30Z</dc:date>
    </item>
    <item>
      <title>Re: Hive: Union all and aggregation are failing with large parquet tables (150 col, 5 mil rows)</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Union-all-and-aggregation-are-failing-with-large/m-p/54085#M60007</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Increase the container and heap size. &amp;nbsp;I am not sure whether it is a mapper or reducer that is failing but here are the settings to look into.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;set &lt;/SPAN&gt;hive.exec.reducers.bytes.per.reducer=&lt;/P&gt;&lt;P&gt;set&amp;nbsp;mapreduce.map.memory.mb=&lt;/P&gt;&lt;P&gt;set&amp;nbsp;mapreduce.reduce.memory.mb=&lt;/P&gt;&lt;P&gt;set mapreduce.map.java.opts=&amp;lt;roughly 80% of container size&amp;gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;set mapreduce.reduce.java.opts=&amp;lt;roughly 80% of container size&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 25 Apr 2017 14:54:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Union-all-and-aggregation-are-failing-with-large/m-p/54085#M60007</guid>
      <dc:creator>mbigelow</dc:creator>
      <dc:date>2017-04-25T14:54:38Z</dc:date>
    </item>
  </channel>
</rss>

