About sergey

sergey · ‎12-12-2017

Yeah I suspect this is where cluster is reaching capacity. Killed task attempts are probably composed of two types: rejected task attempts because LLAP daemon is full and won't accept new work; and killed opportunistic non-finishable tasks (preemption). The latter happen because Hive starts some tasks (esp. reducers) before all inputs for them are ready, to be able to download the inputs from some upstream tasks while waiting for other upstream tasks to finish. When parallel queries want to run a task that can run and finish immediately, they would pre-empt non-finishable tasks (otherwise a task that is potentially doing nothing and waiting for something else to finish could take resouces from the tasks that are ready). This is normal with high volume concurrent queries that amount of preemption increases. The only way to check if there are any other (potentially problematic) kills now is to check the logs... If cache is not as important for these queries you can try to reducehive.llap.task.scheduler.locality.delay, which may cause faster scheduling for tasks (-1 means infinite, the minimum otherwise is 0).. However, once the cluster is at capacity, it's not realistic to expect sub-linear scaling... individual query runtime improvements would also improve aggregate runtime in this case.

sergey · ‎12-02-2017

Finding out the hit rate It is possible to determine cache hit rate per node or per query. Per node, you can see hit rate by looking at LLAP metrics view (<llap node>:15002, see General debugging): Per query, it is possible to see LLAP IO counters (including hit rate) upon running the query by setting hive.tez.exec.print.summary=true, which should produce the counters output at the end of the query, for example - empty cache: Some data in cache: Why is the cache hit rate low Consider the data size and the cache size across all nodes. E.g. with a 10 Gb cache on each of the 4 LLAP nodes, reading 1 Tb of data cannot achieve more than ~4% cache hit rate even in the perfect case. In practice the rate will be lower due to effects of compression (cache is not snappy/gzip compressed, only encoded), interference from other queries, locality misses, etc. In HDP 2.X/Hive 2.X, cache has coarser granularity to avoid some fragmentation issues that are resolved in HDP 3.0/Hive 3.0. This can cause considerable wasted memory in cache on some workload, esp. if the table has a lot of strings with a small range of values, and/or is written with smaller compression buffer sizes than 256Kb. When writing data, you might consider ensuring that ORC compression buffer size is set to 256Kb, and set hive.exec.orc.buffer.size.enforce=true (on HDP 2.6, it requires a backport) to disable writing smaller CBs. This issue doesn't result in errors but can make cache less efficient. If the cache size seems sufficient, check relative data and metadata hit rates (see above screenshots). If there are both data and metadata misses, it can be due to other queries caching different data in place of this queries data, or it could be a locality issue. Check hive.llap.task.scheduler.locality.delay; it can be increased (or set to -1 for infinite delay) to get better locality at the cost of waiting longer to launch tasks, if IO is a bottleneck. If metadata hit rate is very high but data is lower, it is likely that cache doesn't fit all the data; so, some data gets evicted, but metadata that is cached with high priority stays in cache.

sergey · ‎12-02-2017

This article is a short summary of LLAP-specific causes of query slowness. Given that a lot of the Hive query execution (compilation, almost all operator logic) stay the same when LLAP is used, queries can be slow for non-LLAP-related reasons; general Hive query performance issues (e.g. bad plan, skew, poor partitioning, slow fs, ...) should also be considered when investigating. These issues are outside of the scope of this article. Queries are slow On HDP 2.5.3/Hive 2.2 and below, there’s a known issue where queries on LLAP can get slower over time on some workloads. Upgrade to HDP 2.6.X/Hive 2.3 or higher. If you are comparing LLAP against containers on the same cluster, check LLAP cluster size compared to the capacity used for containers. If there are 27 nodes for containers, and a 3-node LLAP cluster, it’s possible containers will be faster because they can use many more CPUs/memory. Generally, queries on LLAP run just like in Hive (see Architecture). So, the standard Hive performance debugging should be performed, starting with explain, looking at Tez view DAG timeline, etc. to narrow it down. Query is stuck (not just slow) Make sure hive.llap.daemon.task.scheduler.enable.preemption is set to true. Look at Tez view to see which tasks are running. Choose one running task (esp. if there’s only one), go to <its llap node>:15002/jmx view, and search for ExecutorsStatus for this task. If the task is missing or says “queued”, it may be a priority inversion bug. A number were fixed over time in HDP 2.6.X. If the task is running, it is possible that this is a general query performance issue (e.g. skew, bad plan, etc.). You can double check <llap node>:15002/stacks to see if it’s running code.

kbkreddy · ‎09-18-2018

HDP 3.0 is causing issues. Have the same hive and llap config in HDP 2.6.x and everything works fine. But on HDP 3.0, I am getting the following error when it tried to start the llap service.. I have checked the yarn UI and verified that the LLAP is running.. but seems like the service is not able to connect or verify that the llap app is running. WARN cli.LlapStatusServiceDriver: Watch mode enabled and got YARN error. Retrying.. LLAP status unknown

sergey · ‎12-05-2017

Sorry, I need to update it; that is also only available in 3.0

sergey · ‎12-01-2017

Introduction: how does LLAP fit into Hive LLAP is a set of persistent daemons that execute fragments of Hive queries. Query execution on LLAP is very similar to Hive without LLAP, except that worker tasks run inside LLAP daemons, and not in containers. High-level lifetime of a JDBC query: Without LLAP With LLAP Query arrives to HS2; it is parsed and compiled into “tasks” Query arrives to HS2; it is parsed and compiled into “tasks” Tasks are handed over to Tez AM (query coordinator) Tasks are handed over to Tez AM (query coordinator) Coordinator (AM) asks YARN for containers Coordinator (AM) locates LLAP instances via ZK Coordinator (AM) pushes task attempts into containers Coordinator (AM) pushes task attempts as fragment into LLAP RecordReader used to read data LLAP IO/cache used to read data or RecordReader used to read data Hive operators are used to process data Hive operators are used to process data* Final tasks write out results into HDFS Final tasks write out results into HDFS HS2 forwards rows to JDBC HS2 forwards rows to JDBC * sometimes, minor LLAP-specific optimizations are possible - e.g. sharing a hash table for map join Theoretically, a hybrid (LLAP+containers) mode is possible, but it doesn’t have advantages in most cases, so it’s rarely used (e.g.: Ambari doesn’t expose any knobs to enable this mode). In both cases, query uses a Tez session (YARN app with aTez AM serving as a query coordinator). In container case, AM will start more containers in the same YARN app; in LLAP case, LLAP itself runs as an external, shared YARN app, so Tez session will only have one container (the query coordinator). Inside LLAP daemon: execution LAP daemon runs work fragments using executors. Each daemon has a number of executors to run several fragments in parallel, and a local work queue. For the Hive case, fragments are similar to task attempts – mappers and reducers. Executors essentially “replace” containers – each is used by one task at a time; the sizing should be very similar for both. Inside LLAP daemon: IO Optionally, fragments may make use of LLAP cache and IO elevator (background IO threads). In HDP 2.6, it’s only supported for ORC format and isn’t supported for most ACID tables. In 3.0, support is added for text, Parquet, and ACID tables. In HDInsight, text format is also added in 2.6. Note that queries can still run in LLAP even if they cannot use the IO layer. Each fragment would only use one IO thread at a time. Cache stores metadata (on heap in 2.6, off heap in 3.0) and encoded data (off-heap); SSD cache option is also added in 3.0 (2.6 on HDInsight).

sergey · ‎12-01-2017

This is the central page that links to other LLAP debugging articles. You can find articles dealing with specific problems below; to provide some background and to help resolve other issues, there are also some helpful general-purpose articles: A one-page overview of LLAP architecture: https://community.hortonworks.com/articles/149894/llap-a-one-page-architecture-overview.html On the general debugging - where to find logs, UIs, metrics, etc.: https://community.hortonworks.com/articles/149896/llap-debugging-overview-logs-uis-etc.html You might also find this non-debugging LLAP sizing and setup guide interesting: https://community.hortonworks.com/articles/149486/llap-sizing-and-setup.html Before getting to specific issues, there are some limitations with LLAP that you should be aware of; the following significant features are not supported: HA (work in progress). doAs=true (no current plans to support it). Temporary functions (will not be supported; use permanent functions). CLI access (use beeline). Then, there are some articles about specific issues: LLAP doesn't start - https://community.hortonworks.com/articles/149899/investigating-when-llap-doesnt-start.html Queries are slow or stuck on LLAP - https://community.hortonworks.com/articles/149900/investigating-when-the-queries-on-llap-are-slow-or.html Investigating cache usage - https://community.hortonworks.com/articles/149901/investigating-llap-cache-hit-rate.html This list will be expanded in future with the issues that people are facing with LLAP.

Mac · ‎01-07-2019

Brilliant article. One suggestion. The example you gave in "Configure interactive query" isn't matching with example you discussed. Can you please modify it.

sergey · ‎12-05-2017

Just a note - on older versions of HDP (2.6.1 and below iirc) it is possible to receive InvalidACL at start time because the LLAP application has failed to start and thus failed to create the path entirely. So, it might be worth checking the LLAP app log if the path does not exist.

ylu · ‎02-10-2017

Yes. Datanode and nodemanager usually colocated. So, if you have 40 datanodes, then deploy 40 nodemanagers on these 40 datanodes. If you have some data that sit on the node that does not have nodemanager, then those data have to be transferred which increases the running time.

Online	Offline
Last Visited	‎10-29-2018 06:21 PM

Member Since	‎10-09-2015 07:07 PM
Last Visited	‎10-29-2018 06:21 PM
Posts	36
Kudos received	77

Cloudera Community

Re: HIVE LLAP: managing concurrent queries

Re: Memory required for insert-select operation in...

Re: LLAP concurrency performance question

Re: HiveServer2 Interactive Start failed

Re: LLAP concurrency performance question

Investigating LLAP cache hit rate

Investigating when the queries on LLAP are slow or...

Re: Investigating when LLAP doesn’t start

Re: LLAP debugging overview - logs, UIs, etc

LLAP - a one-page architecture overview

LLAP troubleshooting and debugging

Re: LLAP sizing and setup

Re: Hive LLAP fails with "InvalidACL for /llap-sas...

Re: How to make more containers in parallel RUNNIN...