About sergey

sergey · ‎08-27-2018

The functionality like that is available in Apache Hive 3.0 (bugfixes in 3.1), as well as HDP 3.0/1; see the "workload management" feature. https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/hive-workload/content/hive_workload_management.html and https://issues.apache.org/jira/browse/HIVE-17481

sergey · ‎05-29-2018

Yes, I think that should be sufficient. You can set it higher if you see issues (like container getting killed by YARN) but 12G definitely seems excessive for 32Gb Xmx. I'll try to find out why the engine would recommend such high values.

sergey · ‎01-19-2018

The general recommended heap allocation for LLAP is ~4Gb of memory /per executor/, so it depends on how many executors you have. You can see this with lots of details at https://community.hortonworks.com/articles/149486/llap-sizing-and-setup.html Sometimes depending on query size and type, it's possible to get away with as low as 1-3Gb per executor; not sure how many executors you are using so it all depends. If the values are somewhere in this range, it would be good to get heap dump on OOM and see what's taking up all the memory. There are two main possible causes - something in processing pipeline, or something in ORC writing (ORC may keep lots of buffers open when writing many files in parallel). The former is supposed to run in 4Gb/executor, so if that's the case it would be a bug and we'd like to see the top users from the heap dump 🙂 For ORC, there is unfortunately not a good formula that I know of cc @owen, but if you write more files (lots of partitions, lots of buckets) you'd need more memory. Hard to tell without knowing the details.

sergey · ‎12-12-2017

Yeah I suspect this is where cluster is reaching capacity. Killed task attempts are probably composed of two types: rejected task attempts because LLAP daemon is full and won't accept new work; and killed opportunistic non-finishable tasks (preemption). The latter happen because Hive starts some tasks (esp. reducers) before all inputs for them are ready, to be able to download the inputs from some upstream tasks while waiting for other upstream tasks to finish. When parallel queries want to run a task that can run and finish immediately, they would pre-empt non-finishable tasks (otherwise a task that is potentially doing nothing and waiting for something else to finish could take resouces from the tasks that are ready). This is normal with high volume concurrent queries that amount of preemption increases. The only way to check if there are any other (potentially problematic) kills now is to check the logs... If cache is not as important for these queries you can try to reducehive.llap.task.scheduler.locality.delay, which may cause faster scheduling for tasks (-1 means infinite, the minimum otherwise is 0).. However, once the cluster is at capacity, it's not realistic to expect sub-linear scaling... individual query runtime improvements would also improve aggregate runtime in this case.

sergey · ‎12-08-2017

If 6 queries take 4 seconds, scaling linearly would mean 50 queries should take ~33 seconds if the cluster was at capacity. I'm assuming there's also some slack, however unless queries are literally 2-3 tasks, that probably means at some point the cluster reaches capacity and the tasks from different queries start queueing up (even if they do run faster due to cache, JIT, etc.). Are 54 executors per machine, or total?By 50% capacity do you mean 4 LLAP with 50% of a node, or 2 LLAPs with full node? In absence of other information 13sec (which is much better than linear scaling) actually looks pretty good. One would need to look at query details and especially cluster utilization with different number of queries to determine if one needs parallel query improvement, or (which is more likely) single-query improvement because the tasks are simply waiting for other tasks with cluster at capacity. Also, 4 queries by default is just a reasonable default for most people starting out; parallel query capacity actually requires some resources (to have AMs present; otherwise AM startup will affect perf a lot), so we chose a value on the low end. As for memory usage per query, the counters at the end and in Tez UI output cache/IO memory used, however Java memory use by execution is not tracked well right now.

sergey · ‎12-07-2017

Thank you 🙂

sergey · ‎12-05-2017

Just a note - on older versions of HDP (2.6.1 and below iirc) it is possible to receive InvalidACL at start time because the LLAP application has failed to start and thus failed to create the path entirely. So, it might be worth checking the LLAP app log if the path does not exist.

sergey · ‎12-05-2017

Sorry, I need to update it; that is also only available in 3.0

sergey · ‎12-02-2017

Finding out the hit rate It is possible to determine cache hit rate per node or per query. Per node, you can see hit rate by looking at LLAP metrics view (<llap node>:15002, see General debugging): Per query, it is possible to see LLAP IO counters (including hit rate) upon running the query by setting hive.tez.exec.print.summary=true, which should produce the counters output at the end of the query, for example - empty cache: Some data in cache: Why is the cache hit rate low Consider the data size and the cache size across all nodes. E.g. with a 10 Gb cache on each of the 4 LLAP nodes, reading 1 Tb of data cannot achieve more than ~4% cache hit rate even in the perfect case. In practice the rate will be lower due to effects of compression (cache is not snappy/gzip compressed, only encoded), interference from other queries, locality misses, etc. In HDP 2.X/Hive 2.X, cache has coarser granularity to avoid some fragmentation issues that are resolved in HDP 3.0/Hive 3.0. This can cause considerable wasted memory in cache on some workload, esp. if the table has a lot of strings with a small range of values, and/or is written with smaller compression buffer sizes than 256Kb. When writing data, you might consider ensuring that ORC compression buffer size is set to 256Kb, and set hive.exec.orc.buffer.size.enforce=true (on HDP 2.6, it requires a backport) to disable writing smaller CBs. This issue doesn't result in errors but can make cache less efficient. If the cache size seems sufficient, check relative data and metadata hit rates (see above screenshots). If there are both data and metadata misses, it can be due to other queries caching different data in place of this queries data, or it could be a locality issue. Check hive.llap.task.scheduler.locality.delay; it can be increased (or set to -1 for infinite delay) to get better locality at the cost of waiting longer to launch tasks, if IO is a bottleneck. If metadata hit rate is very high but data is lower, it is likely that cache doesn't fit all the data; so, some data gets evicted, but metadata that is cached with high priority stays in cache.

sergey · ‎12-02-2017

This article is a short summary of LLAP-specific causes of query slowness. Given that a lot of the Hive query execution (compilation, almost all operator logic) stay the same when LLAP is used, queries can be slow for non-LLAP-related reasons; general Hive query performance issues (e.g. bad plan, skew, poor partitioning, slow fs, ...) should also be considered when investigating. These issues are outside of the scope of this article. Queries are slow On HDP 2.5.3/Hive 2.2 and below, there’s a known issue where queries on LLAP can get slower over time on some workloads. Upgrade to HDP 2.6.X/Hive 2.3 or higher. If you are comparing LLAP against containers on the same cluster, check LLAP cluster size compared to the capacity used for containers. If there are 27 nodes for containers, and a 3-node LLAP cluster, it’s possible containers will be faster because they can use many more CPUs/memory. Generally, queries on LLAP run just like in Hive (see Architecture). So, the standard Hive performance debugging should be performed, starting with explain, looking at Tez view DAG timeline, etc. to narrow it down. Query is stuck (not just slow) Make sure hive.llap.daemon.task.scheduler.enable.preemption is set to true. Look at Tez view to see which tasks are running. Choose one running task (esp. if there’s only one), go to <its llap node>:15002/jmx view, and search for ExecutorsStatus for this task. If the task is missing or says “queued”, it may be a priority inversion bug. A number were fixed over time in HDP 2.6.X. If the task is running, it is possible that this is a general query performance issue (e.g. skew, bad plan, etc.). You can double check <llap node>:15002/stacks to see if it’s running code.

Online	Offline
Last Visited	‎10-29-2018 06:21 PM

Member Since	‎10-09-2015 07:07 PM
Last Visited	‎10-29-2018 06:21 PM
Posts	36
Kudos received	77

Cloudera Community

Re: HIVE LLAP: managing concurrent queries

Re: Memory required for insert-select operation in...

Re: LLAP concurrency performance question

Re: HiveServer2 Interactive Start failed

Re: HIVE LLAP: managing concurrent queries

Re: LLAP sizing and setup

Re: Memory required for insert-select operation in...

Re: LLAP concurrency performance question

Re: LLAP concurrency performance question

Re: LLAP sizing and setup

Re: Hive LLAP fails with "InvalidACL for /llap-sas...

Re: LLAP debugging overview - logs, UIs, etc

Investigating LLAP cache hit rate

Investigating when the queries on LLAP are slow or...