Hive2 Interactive refusing to launch because llap can't successfully launch. LLAP having difficulty acquiring resources. The exact status found in Yarn > Quick Links > Resource Manager UI > Running > llap0 > Application Master:
Component Desired Actual Outstanding Requests Failed Failed to start Placement LLAP 6 1 5 0 0 Anti-affinity: 5 pending requests slider-appmaster 1 1 0 0 0
It stays stuck like this forever. I adjusted the "Maximum Total Concurrent Queries" from 32 to 16, but it didn't help.
Hive2 Interactive complains:
LLAP app 'llap0' current state is LAUNCHING.
It stays stuck in that state. What is there I need to do to successfully launch llap?
We have just overcame such issue.
I assume you have enabled pre-emption on YARN.
After that, a little bit of Hive tuning is kind of prerequisite for all users when enabling LLAP (interactive query component). Most common problem with starting interactive server is conflicts between YARN and Hive for resource usage. There are some main parameters which are mostly effective on starting (and running stable) LLAP:
hive.llap.daemon.yarn.container.mb: Suggestion on Ambari is pretty clear:
Total memory used by individual LLAP daemons (YARN Container size). This includes memory for the cache as well as for the query execution. Should be larger than the sum of the Daemon cache size and the daemon heap size, and should leave some headroom after this (In most cases: cache size + heap size + headroom = Memory Per Daemon)
We also tried to keep it slightly below yarn.scheduler.maximum-allocation-mb value of yarn config. This way we prevent resource conflict with YARN in basic level.
Maximum headroom reserved from the YARN container running LLAP daemons. This is an upper limit used during automatic size calculations, and the actual value may be lower.
We left this as recommended value which is very low compared to others. It is about %6 of llap_heap_size
LLAP Daemon Heap Size in MB.
The amount of memory reserved for Hive's optimized in-memory cache.
In memory cache size should be %20 of llap heap size
And below parameters are number of executors which are set in default according to number of CPU's
Number of Node(s) on which Hive LLAP daemon runs.<br>
We set this value to number of our nodes (nodemanagers / datanodes).
The number of fragments that a single LLAP daemon will run concurrently. Usually, this will be the same as the number of available CPUs<br>
We set this to %40 of number of CPUs
Specify the number of threads to use for low-level IO thread pool.<br>
We also set this one with the same value of number of executors.
After updating these values, you can save configs and start interactive server.
Afterwards, you may need to update queue capacities on Yarn. If you are using Ambari, Yarn Queue Manager view is ideal as it provides ability to view and adjust the values visually. Here you should check if LLAP uses more resource over default queue. If you don't have YARN Queue Manager, you can adjust below parameters and variants of the for default queue.
Thanks for the info. I gave my reply below. Can you recommend a resource for learning the administration of Yarn? There's a lot of moving parts; trying to figure it out piecemeal on my own probably is not going to work.
For a complete YARN administration I would recommend below famous book by Sam R. Alapati:
Chapter 13 describes how YARN works and Chapter 18 explains how you can make tuning.
Below page describes not only Hive parameters for Hive tuning but also mentions what to do on Yarn.
I would suggest reading this to understand how capacity scheduler works. This way you understand how it works and what to do to share resources:
@Sedat Kestepe Thanks for the info. Each of our nodemanagers has 240GB physical RAM, we have the container minimums and maximums set at 3584MB and 217GB respectively. So, I took your suggestions and was able to get llap launched with the following:
llap heap size = 3750 (3.75G)
llap cache size = 3000 (3.0G)
llap memory per daemon = 8000 (8.0G)
io.threadpool.size = 16 (we have 38 cpus total per node)
I wasn't able to find a num.executors or llap_nodes_per_daemon setting.
With the above configuration, each daemon will occupy a container that's 3x(minimum container size).
Property name to know number of llap nodes for llap daemons is "num_llap_nodes_for_llap_daemons"