Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDP 3.1.4 - Hive query - OOM on any ops on paritioned table


HDP 3.1.4 - Hive query - OOM on any ops on paritioned table

New Contributor



We recently did an upgrade on our Hadoop cluster from 2.6.4 to 3.1.4 - everything went decently - a few random issues that we managed to solve ourselves.
But this one - it's for sure something we don't understand.

We have a partitioned table with around 5b rows (1Tb+) and it's partitioned into 256 almost even partitions (and is also distributed by a certain, single column).




 PARTITIONED BY (                                   
   `partfunc` string)                               
 CLUSTERED BY (                                     
 SORTED BY (                                        
   event_time ASC)                                  
 INTO 10 BUCKETS                                    
 ROW FORMAT SERDE                                   
 STORED AS INPUTFORMAT                              
 TBLPROPERTIES (                                    




If I try on this table "select count(*)" or "analyze table product_events_optimized partition(partfunc) compute statistics for columns event_time, event_name, uid, fingerprint", and other full table queries in general, we get the following error on the AM container:




2019-09-27 19:12:38,856 [ERROR] [ORC_GET_SPLITS #2] |io.AcidUtils|: Failed to get files with ID; using regular API: Java heap space
2019-09-27 19:12:38,862 [WARN] [ResponseProcessor for block BP-328156957-] |hdfs.DataStreamer|: Exception for BP-328156957- Unexpected EOF while trying to read response from server
    at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(
    at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(
    at org.apache.hadoop.hdfs.DataStreamer$




(here is a full log from that container

Now, I played with this quite a bit - and if I add the "partfunc in (..)" predicate, it works fine - up to 50-60 partitions.
How it behaves is that it starts building the plan, and stays in "Map 1", -1, INITIALIZING a while - until it determines the number of mappers, and then starts multiple Mappers on "Map 1". When it OOMs, it OOMs in the "INITIALIZING" part of "Map 1" - and I cannot understand why. It seems that it's downloading some HDFS blocks, to determine the plan (?) - and if there are more partitions, it OOMs - this doesn't make too much sense to me - cause it looks like a big scalability issue.

Currently hive.tez.container.size is set at 4096MB and the am/task set at 3270MB.
I tried to up it to 8192 but it was failing with the same OOM error.
I don't think this is expected to work like this - if we have 256 partitions and it ooms at around 50 with 4G - we are expected to raise the text container size to 20G.... Something looks off..§


Please let me know if you want other info - configs, logs.


Don't have an account?
Coming from Hortonworks? Activate your account here