<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: HDP 3.1.4 - Hive query - OOM on any ops on paritioned table in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/HDP-3-1-4-Hive-query-OOM-on-any-ops-on-paritioned-table/m-p/283997#M210947</link>
    <description>&lt;P&gt;I ended up recreating the table. Started with Hive INSERT INTO SELECT then copying the files directly on HDFS.&lt;BR /&gt;&lt;BR /&gt;Sadly I didn't try to "&lt;SPAN&gt;MSCK REPAIR TABLE" (or I don't remember).&lt;BR /&gt;&lt;BR /&gt;So with the new table it looks all fine - probably the older table has the metadata messed up by the HDP upgrade &amp;amp; migration.&lt;BR /&gt;&lt;BR /&gt;So if someone else, somehow, gets here:&lt;BR /&gt;1. try "MSCK REPAIR TABLE"&lt;BR /&gt;2. try to create a new table over that location.&lt;BR /&gt;3. move / copy the data to a new table's location and&amp;nbsp;"MSCK REPAIR TABLE"&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 25 Nov 2019 19:18:35 GMT</pubDate>
    <dc:creator>Aris</dc:creator>
    <dc:date>2019-11-25T19:18:35Z</dc:date>
    <item>
      <title>HDP 3.1.4 - Hive query - OOM on any ops on paritioned table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HDP-3-1-4-Hive-query-OOM-on-any-ops-on-paritioned-table/m-p/278524#M208079</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We recently did an upgrade on our Hadoop cluster from 2.6.4 to 3.1.4 - everything went decently - a few random issues that we managed to solve ourselves.&lt;BR /&gt;But this one - it's for sure something we don't understand.&lt;BR /&gt;&lt;BR /&gt;We have a partitioned table with around 5b rows (1Tb+) and it's partitioned into 256 almost even partitions (and is also distributed by a certain, single column).&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt; PARTITIONED BY (                                   
   `partfunc` string)                               
 CLUSTERED BY (                                     
   event_name)                                      
 SORTED BY (                                        
   event_time ASC)                                  
 INTO 10 BUCKETS                                    
 ROW FORMAT SERDE                                   
   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'      
 STORED AS INPUTFORMAT                              
   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  
 OUTPUTFORMAT                                       
   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' 
 LOCATION                                           
   'hdfs://host:8020/flume/product_events/product_events_optimized' 
 TBLPROPERTIES (                                    
   'orc.bloom.filter.columns'='fingerprint,sensor_name,sensor_version,bd_product_version',  
   'orc.create.index'='true',                       
   'transient_lastDdlTime'='1567779991')&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;BR /&gt;If I try on this table "select count(*)" or "analyze table product_events_optimized partition(partfunc) compute statistics for columns event_time, event_name, uid, fingerprint", and other full table queries in general, we get the following error on the AM container:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;2019-09-27 19:12:38,856 [ERROR] [ORC_GET_SPLITS #2] |io.AcidUtils|: Failed to get files with ID; using regular API: Java heap space
2019-09-27 19:12:38,862 [WARN] [ResponseProcessor for block BP-328156957-10.18.69.65-1534169825766:blk_1113315497_39633349] |hdfs.DataStreamer|: Exception for BP-328156957-10.18.69.65-1534169825766:blk_1113315497_39633349
java.io.EOFException: Unexpected EOF while trying to read response from server
    at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:549)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
    at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1086)&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;(here is a full log from that container&lt;A href="https://gist.github.com/arisro/9a2a81ad00cf732d85f23ad9e801b5d0" target="_blank" rel="noopener"&gt;https://gist.github.com/arisro/9a2a81ad00cf732d85f23ad9e801b5d0&lt;/A&gt;)&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Now, I played with this quite a bit - and if I add the "partfunc in (..)" predicate, it works fine - up to 50-60 partitions.&lt;BR /&gt;How it behaves is that it starts building the plan, and stays in "Map 1", -1, INITIALIZING a while - until it determines the number of mappers, and then starts multiple Mappers on "Map 1". When it OOMs, it OOMs in the "INITIALIZING" part of "Map 1" - and I cannot understand why. It seems that it's downloading some HDFS blocks, to determine the plan (?) - and if there are more partitions, it OOMs - this doesn't make too much sense to me - cause it looks like a big scalability issue.&lt;BR /&gt;&lt;BR /&gt;Currently hive.tez.container.size is set at 4096MB and the am/task set at&amp;nbsp;3270MB.&lt;BR /&gt;I tried to up it to 8192 but it was failing with the same OOM error.&lt;BR /&gt;I don't think this is expected to work like this - if we have 256 partitions and it ooms at around 50 with 4G - we are expected to raise the text container size to 20G.... Something looks off..§&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please let me know if you want other info - configs, logs.&lt;BR /&gt;&lt;BR /&gt;Thanks!&lt;BR /&gt;Aris&lt;/P&gt;</description>
      <pubDate>Mon, 30 Sep 2019 11:25:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HDP-3-1-4-Hive-query-OOM-on-any-ops-on-paritioned-table/m-p/278524#M208079</guid>
      <dc:creator>Aris</dc:creator>
      <dc:date>2019-09-30T11:25:41Z</dc:date>
    </item>
    <item>
      <title>Re: HDP 3.1.4 - Hive query - OOM on any ops on paritioned table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HDP-3-1-4-Hive-query-OOM-on-any-ops-on-paritioned-table/m-p/283997#M210947</link>
      <description>&lt;P&gt;I ended up recreating the table. Started with Hive INSERT INTO SELECT then copying the files directly on HDFS.&lt;BR /&gt;&lt;BR /&gt;Sadly I didn't try to "&lt;SPAN&gt;MSCK REPAIR TABLE" (or I don't remember).&lt;BR /&gt;&lt;BR /&gt;So with the new table it looks all fine - probably the older table has the metadata messed up by the HDP upgrade &amp;amp; migration.&lt;BR /&gt;&lt;BR /&gt;So if someone else, somehow, gets here:&lt;BR /&gt;1. try "MSCK REPAIR TABLE"&lt;BR /&gt;2. try to create a new table over that location.&lt;BR /&gt;3. move / copy the data to a new table's location and&amp;nbsp;"MSCK REPAIR TABLE"&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 25 Nov 2019 19:18:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HDP-3-1-4-Hive-query-OOM-on-any-ops-on-paritioned-table/m-p/283997#M210947</guid>
      <dc:creator>Aris</dc:creator>
      <dc:date>2019-11-25T19:18:35Z</dc:date>
    </item>
  </channel>
</rss>

