This article is a short summary of LLAP-specific causes of query slowness. Given that a lot of the Hive query execution (compilation, almost all operator logic) stay the same when LLAP is used, queries can be slow for non-LLAP-related reasons; general Hive query performance issues (e.g. bad plan, skew, poor partitioning, slow fs, ...) should also be considered when investigating. These issues are outside of the scope of this article.
Queries are slow
On HDP 2.5.3/Hive 2.2 and below, there’s a known issue where queries on LLAP can get slower over time on some workloads. Upgrade to HDP 2.6.X/Hive 2.3 or higher.
If you are comparing LLAP against containers on the same cluster, check LLAP cluster size compared to the capacity used for containers. If there are 27 nodes for containers, and a 3-node LLAP cluster, it’s possible containers will be faster because they can use many more CPUs/memory.
Generally, queries on LLAP run just like in Hive (see Architecture). So, the standard Hive performance debugging should be performed, starting with explain, looking at Tez view DAG timeline, etc. to narrow it down.
Query is stuck (not just slow)
Make sure hive.llap.daemon.task.scheduler.enable.preemption is set to true.
Look at Tez view to see which tasks are running.
Choose one running task (esp. if there’s only one), go to <its llap node>:15002/jmx view, and search for ExecutorsStatus for this task.
If the task is missing or says “queued”, it may be a priority inversion bug. A number were fixed over time in HDP 2.6.X.
If the task is running, it is possible that this is a general query performance issue (e.g. skew, bad plan, etc.). You can double check <llap node>:15002/stacks to see if it’s running code.