Member since
09-18-2015
5
Posts
7
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4989 | 12-04-2017 09:46 PM |
12-10-2017
11:33 PM
Can you try re-running query with hive.stats.fetch.bitvector=true?
... View more
12-08-2017
08:04 PM
This is likely because log4j-1.2.x.jar is in your classpath somewhere which is getting bundled up in LLAP package. Hive2 and LLAP moved to log4-2.x which is not compatible with old log4j.. If you have HADOOP_USER_CLASSPATH_FIRST set to true in your environment, old log4j should not be picked. It could also be that hive libs directory is transitively pulling old log4j. When LLAP bundles a package only the following log4j 2.x jars are expected log4j-1.2-api-2.8.2.jar (bridge jar) log4j-api-2.8.2.jar log4j-core-2.8.2.jar log4j-jul-2.5.jar log4j-slf4j-impl-2.8.2.jar log4j-web-2.8.2.jar Can you check if your hadoop classpath is having log4j-1.2.x.jar by any chance? If so please back that up somewhere/remove and retry.
... View more
12-06-2017
06:59 AM
If table stats are missings, hive optimizer guesses the stats based on file size which can be way off as on disk file sizes are encoded and compressed. When available optimizer will use rawDataSize instead of totalFileSize which is supported only for some formats. rawDataSize will be close to in-memory data size when the table gets loaded for processing. Even with rawDataSize, optimizer stats estimation can be way off when compared to actual values as some operators like filters and group by will require column statistics to make better decisions (specifically distinct column value count NDV). If you have column statistics still there can be cases where join decisions can go wrong as NDV values are not merged correctly across partitions in some older releases. To fix the NDV merge, make sure to enable bitvector merging via hive.stats.ndv.algo="hll" and hive.stats.fetch.bitvector=true. With hyperloglog, NDV values for columns are always closer to actual value (<4% error) even after merging across partitions. With all these stats, optimizer can better predict how many actual distinct values are going to be loaded by a hash table during join so as to avoid MapJoinMemoryExhaustionError. I am not exactly sure how the explain plan looks like between HDP 2.6.1 and HDP 2.6.3, without looking at explain plan it is hard to guess. But the overall idea is that the more statistics the better will be optimizers decision. totalFileSize < rawDataSize < column stats < column stats with bitvector merging Make sure you have column stats with bitvector merging enabled for optimal query planning. The reason for MapJoinMemoryExhaustionError is that, optimizer says for example that a map join will load 10M unique keys into hash table whereas in reality there are 100M keys which will exceed fair usage of memory by a single executor. Another thing to look into is, make sure vector mapjoin operator is being used as memory estimation for vectorized map join operator will be more accurate than non-vectorized version (java objects memory usage estimation is hard).
... View more
12-04-2017
09:46 PM
Map join decisions are not based on table stats but rather the operator stats just above the join operator. If you have filters, projections and aggregations in your query the stats will get dramatically reduced by the it reaches join operator. You probably want to look at the data size of operator just above the map join operator. Also make sure all operators show "Basic stats : COMPLETE Column stats : COMPLETE" in the explain plan. One other thing is that LLAP decides map join differently than MR/Tez. For more information about how LLAP decides map joins please refer to https://community.hortonworks.com/articles/149998/map-join-memory-sizing-for-llap.html
... View more
12-04-2017
07:56 PM
7 Kudos
Executor memory size and map join memory for LLAP
This article is intended for understanding map join memory sizing and related configurations specific to LLAP.
For detailed information about LLAP sizing and setup, please refer to LLAP Sizing and Setup
Executor memory size is shared among several components like join hash table loading, map-side aggregation hash tables, PTF operator buffering, shuffle and sort buffers. We recommend 4Gb for a general case (per LLAP Sizing and Setup), configuring less memory can slow down some queries (e.g. hash joins will turn into sort merge joins) and configuring more memory can be wasteful or lead to GC pressure (although more memory can provide more room for hash joins, it can adversely affect query performance when all/many executors load huge hash tables resulting in severe GC pauses).
LLAP adds 2 new changes for hash table loaders for map joins
Oversubscription of memory (from other executors) to provide more room for in-memory hash tables and more map join conversions. Memory monitoring during hashtable loading monitors hash table's in-memory size for fair usage. If memory monitor finds an executor using more memory beyonds its limits during hashtable loading, the query will fail with mapjoin memory exhaustion error.
The way oversubscription of memory works for map join hash tables is, every executor borrows 20% of hive.auto.convert.join.noconditionaltask.size from self and 3 other executors configurable via hive.llap.mapjoin.memory.oversubscribe.factor and hive.llap.memory.oversubscription.max.executors.per.query respectively.
For example, let’s say hive.auto.convert.join.noconditionaltask.size is set to 1GB, in the oversubscription model, map join conversions will inflate this 1GB no-conditional task size to [1GB + (4 * 0.2 * 1GB)] = 1.8GB, where 4 is self + 3 other executors to borrow memory from and 0.2 is 20% of no-conditional task size. Hive query compiler in LLAP mode will take this oversubscribed memory (1.8GB) into consideration when making decision about map join conversions. As a result in LLAP mode, one can observe more map join conversions because of this new oversubscribed memory of 1.8GB.
The way memory monitoring for hash table loader works is, after loading every 100000 rows (configurable via hive.llap.mapjoin.memory.monitor.check.interval) into hash tables, memory monitor will compute the memory usage of hash tables. If memory usage exceeds certain threshold (2x oversubscribed no-conditional task size = 3.6GB in above example), memory monitor will kill the query as it exceeds its memory usage limits. 2x is inflation factor to accommodate additional java object allocation overhead configurable via hive.hash.table.inflation.factor.
It is recommended not to change the default configs for memory oversubscription and memory monitoring as it can significantly impact current query performance or performance of other queries. Reducing the oversubscription memory fraction or additional executors to borrow, will result in less queries being converted to hash joins which can slow down the query performance. On the other hand, increasing these values will result in increased memory usage during hashtable loading which can lead to severe garbage collection when multiple executors are loading hash tables at the same time and can also lead unfair/inflated memory usage only for subset of executors also leading to degraded query performance for all queries.
As always, it is recommended to keep table, partition and column statistics up-to-date for Hive query compiler to generate optimal query plan. If statistics are not present, in some cases MapJoinMemoryExhaustionError may be thrown like below Caused by: org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionError: VectorMapJoin Hash table loading exceeded memory limits for input: Map 2 numEntries: 102000000 estimatedMemoryUsage: 4294983912 effectiveThreshold: 3664143872 memoryMonitorInfo: { isLlap: true executorsPerNode: 24 maxExecutorsOverSubscribeMemory: 3 memoryOverSubscriptionFactor: 0.20000000298023224 memoryCheckInterval: 100000 noConditionalTaskSize: 1017817776 adjustedNoConditionalTaskSize: 1832071997 hashTableInflationFactor: 2.0 threshold: 3664143872 }
In the above error, we can see that Map 2 is broadcasting its data to hash table loader which failed after loading 102000000 rows for exhausting its memory limit of 3664143872 bytes (2 * [1017817776 + 0.8*1017817776]). This is likely happening because of missing statistics that prevents optimizer from making better decision (in this case MAP 2 could have been a shuffle join with correct statistics).
... View more
Labels: