YARN is essentially a
system for managing distributed applications. Itconsists of a
central Resource manager, which arbitrates all available cluster
resources, and a per-node Node Manager, which takes direction from the Resource
manager. Resource Manager and node Manager follow a master slave relationship.
The Node manager is responsible for managing available resources on a single
node. Yarn defines a unit of work in terms of container.. It is available in each node.
Application Master negotiates container with the scheduler(one of the component
of Resource Manager). Containers are launched by Node Manager
Understanding YARN Memory configuration
Memory allocated for all YARN container on a
node : Total amount of memory that can
be used by Node manager on every node for allocating containers.
Minimum container size : minimum amount of RAM
that will be allocated to a requested container. Any container requested will
be allocated memory in multiple of the Minimum container size.
Maximum container size : The max amount of RAM
that can be allocated to a single container. Maximum container size <= Memory
allocated for all YARN container on a node
Daemons run as yarn container hence LLAP daemon size should be >=
Minimum container size but <= Maximum container size
Percentage of physical CPU allocated for all
containers on a node : X% of the total cpu
that can be used by the containers.
The value should never be 100% as cpu is needed by data Nodes, Node
Manager and OS.
Minimum Container Vcores: minimum
number of cpu that will be allocated to a given container.
Maximum Container Vcore: maximum number of Vcpu
that can be allocated to a container.
CPU isolation : this enables c-groups, enforcing containers to use exactly the number
of CPU allocated to them. If this option is disabled then a container is free
to occupy all the CPUs available on the machine.
daemon run as a big YARN container hence always ensure that Maximum
Container Size Vcore is set equal to number of Vcores available to run YARN Container ( 80% of total number of CPU
available on that host).
CPU isolation is enabled it becomes even more important to set Maximum Container Size Vcore to its appropriate value
known as Live Long
and Process, LLAP provides a hybrid execution model. It consists of a
long-lived daemon which replaces direct interactions with the HDFS Data Node,
and a tightly integrated DAG-based framework.
Functionality such as caching, pre-fetching, some query processing and access
control are moved into the daemon. Small/short queries are largely
processed by this daemon directly, while any heavy lifting will be performed in
standard YARN containers.
Similar to the Data
Node, LLAP daemons can be used by other applications as well, especially if a
relational view on the data is preferred over file-centric processing. The
daemon is also open through optional APIs (e.g., Input Format) that can be
leveraged by other data processing frameworks as a
Hive LLAP consists of the following component
Hive Interactive Server :
Thrift server which provide JDBC interface to connect to the Hive LLAP.
Slider AM : The slider application which spawns,
monitor and maintains the LLAP daemons.
TEZ AM query coordinator : TEZ Am which accepts
the incoming the request of the user and execute them in executors available
inside the LLAP daemons (JVM).
LLAP daemons : To facilitate caching and JIT
optimization, and to eliminate most of the startup costs, a daemon runs on the
worker nodes on the cluster. The daemon handles I/O, caching, and query
LLAP configuration in details
Conf Section of Hive
Rule and comments
Tez AM coordinator Size
Number of Cordinators
Number of Concurrent Queries LLAP support.
This will result in spawning equal number of TEZ AM.
HDP 2.6.4 preemption of queries is not supported.
2. If multiple concurrent queries have
exhausted the queue then any incoming
query will bin waiting state.
3. All Queries running on Hive LLAP can be seen in the TEZ UI.