Community Articles

sergey · ‎12-01-2017

Introduction: how does LLAP fit into Hive

LLAP is a set of persistent daemons that execute fragments of Hive queries. Query execution on LLAP is very similar to Hive without LLAP, except that worker tasks run inside LLAP daemons, and not in containers.

High-level lifetime of a JDBC query:

Without LLAP	With LLAP
Query arrives to HS2; it is parsed and compiled into “tasks”	Query arrives to HS2; it is parsed and compiled into “tasks”
Tasks are handed over to Tez AM (query coordinator)	Tasks are handed over to Tez AM (query coordinator)
Coordinator (AM) asks YARN for containers	Coordinator (AM) locates LLAP instances via ZK
Coordinator (AM) pushes task attempts into containers	Coordinator (AM) pushes task attempts as fragment into LLAP
RecordReader used to read data	LLAP IO/cache used to read data or RecordReader used to read data
Hive operators are used to process data	Hive operators are used to process data*
Final tasks write out results into HDFS	Final tasks write out results into HDFS
HS2 forwards rows to JDBC	HS2 forwards rows to JDBC

* sometimes, minor LLAP-specific optimizations are possible - e.g. sharing a hash table for map join

Theoretically, a hybrid (LLAP+containers) mode is possible, but it doesn’t have advantages in most cases, so it’s rarely used (e.g.: Ambari doesn’t expose any knobs to enable this mode).

In both cases, query uses a Tez session (YARN app with aTez AM serving as a query coordinator). In container case, AM will start more containers in the same YARN app; in LLAP case, LLAP itself runs as an external, shared YARN app, so Tez session will only have one container (the query coordinator).

Inside LLAP daemon: execution

LAP daemon runs work fragments using executors. Each daemon has a number of executors to run several fragments in parallel, and a local work queue. For the Hive case, fragments are similar to task attempts – mappers and reducers. Executors essentially “replace” containers – each is used by one task at a time; the sizing should be very similar for both.

Inside LLAP daemon: IO

Optionally, fragments may make use of LLAP cache and IO elevator (background IO threads). In HDP 2.6, it’s only supported for ORC format and isn’t supported for most ACID tables. In 3.0, support is added for text, Parquet, and ACID tables. In HDInsight, text format is also added in 2.6.

Note that queries can still run in LLAP even if they cannot use the IO layer. Each fragment would only use one IO thread at a time. Cache stores metadata (on heap in 2.6, off heap in 3.0) and encoded data (off-heap); SSD cache option is also added in 3.0 (2.6 on HDInsight).

Cloudera Community

Community Articles

LLAP - a one-page architecture overview

Apache Hive

Introduction: how does LLAP fit into Hive

Inside LLAP daemon: execution

Inside LLAP daemon: IO