About Aris

Aris · ‎11-25-2019

I ended up recreating the table. Started with Hive INSERT INTO SELECT then copying the files directly on HDFS. Sadly I didn't try to "MSCK REPAIR TABLE" (or I don't remember). So with the new table it looks all fine - probably the older table has the metadata messed up by the HDP upgrade & migration. So if someone else, somehow, gets here: 1. try "MSCK REPAIR TABLE" 2. try to create a new table over that location. 3. move / copy the data to a new table's location and "MSCK REPAIR TABLE"

Aris · ‎09-30-2019

Hello, We recently did an upgrade on our Hadoop cluster from 2.6.4 to 3.1.4 - everything went decently - a few random issues that we managed to solve ourselves. But this one - it's for sure something we don't understand. We have a partitioned table with around 5b rows (1Tb+) and it's partitioned into 256 almost even partitions (and is also distributed by a certain, single column). PARTITIONED BY ( `partfunc` string) CLUSTERED BY ( event_name) SORTED BY ( event_time ASC) INTO 10 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://host:8020/flume/product_events/product_events_optimized' TBLPROPERTIES ( 'orc.bloom.filter.columns'='fingerprint,sensor_name,sensor_version,bd_product_version', 'orc.create.index'='true', 'transient_lastDdlTime'='1567779991') If I try on this table "select count(*)" or "analyze table product_events_optimized partition(partfunc) compute statistics for columns event_time, event_name, uid, fingerprint", and other full table queries in general, we get the following error on the AM container: 2019-09-27 19:12:38,856 [ERROR] [ORC_GET_SPLITS #2] |io.AcidUtils|: Failed to get files with ID; using regular API: Java heap space 2019-09-27 19:12:38,862 [WARN] [ResponseProcessor for block BP-328156957-10.18.69.65-1534169825766:blk_1113315497_39633349] |hdfs.DataStreamer|: Exception for BP-328156957-10.18.69.65-1534169825766:blk_1113315497_39633349 java.io.EOFException: Unexpected EOF while trying to read response from server at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:549) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213) at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1086) (here is a full log from that containerhttps://gist.github.com/arisro/9a2a81ad00cf732d85f23ad9e801b5d0) Now, I played with this quite a bit - and if I add the "partfunc in (..)" predicate, it works fine - up to 50-60 partitions. How it behaves is that it starts building the plan, and stays in "Map 1", -1, INITIALIZING a while - until it determines the number of mappers, and then starts multiple Mappers on "Map 1". When it OOMs, it OOMs in the "INITIALIZING" part of "Map 1" - and I cannot understand why. It seems that it's downloading some HDFS blocks, to determine the plan (?) - and if there are more partitions, it OOMs - this doesn't make too much sense to me - cause it looks like a big scalability issue. Currently hive.tez.container.size is set at 4096MB and the am/task set at 3270MB. I tried to up it to 8192 but it was failing with the same OOM error. I don't think this is expected to work like this - if we have 256 partitions and it ooms at around 50 with 4G - we are expected to raise the text container size to 20G.... Something looks off..§ Please let me know if you want other info - configs, logs. Thanks! Aris

Online	Offline
Last Visited	‎04-30-2020 03:26 AM

Member Since	‎09-29-2019 11:43 PM
Last Visited	‎04-30-2020 03:26 AM
Posts	2

Cloudera Community

Re: HDP 3.1.4 - Hive query - OOM on any ops on par...

Re: HDP 3.1.4 - Hive query - OOM on any ops on par...

HDP 3.1.4 - Hive query - OOM on any ops on paritio...