Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Rising Star

Introduction: how does LLAP fit into Hive

LLAP is a set of persistent daemons that execute fragments of Hive queries. Query execution on LLAP is very similar to Hive without LLAP, except that worker tasks run inside LLAP daemons, and not in containers.

High-level lifetime of a JDBC query:

Without LLAP With LLAP
Query arrives to HS2; it is parsed and compiled into “tasks” Query arrives to HS2; it is parsed and compiled into “tasks”
Tasks are handed over to Tez AM (query coordinator) Tasks are handed over to Tez AM (query coordinator)
Coordinator (AM) asks YARN for containers Coordinator (AM) locates LLAP instances via ZK
Coordinator (AM) pushes task attempts into containers Coordinator (AM) pushes task attempts as fragment into LLAP
RecordReader used to read data LLAP IO/cache used to read data

or RecordReader used to read data

Hive operators are used to process data Hive operators are used to process data*
Final tasks write out results into HDFS Final tasks write out results into HDFS
HS2 forwards rows to JDBC HS2 forwards rows to JDBC

* sometimes, minor LLAP-specific optimizations are possible - e.g. sharing a hash table for map join

Theoretically, a hybrid (LLAP+containers) mode is possible, but it doesn’t have advantages in most cases, so it’s rarely used (e.g.: Ambari doesn’t expose any knobs to enable this mode).

42844-image6.png

In both cases, query uses a Tez session (YARN app with aTez AM serving as a query coordinator). In container case, AM will start more containers in the same YARN app; in LLAP case, LLAP itself runs as an external, shared YARN app, so Tez session will only have one container (the query coordinator).

Inside LLAP daemon: execution

LAP daemon runs work fragments using executors. Each daemon has a number of executors to run several fragments in parallel, and a local work queue. For the Hive case, fragments are similar to task attempts – mappers and reducers. Executors essentially “replace” containers – each is used by one task at a time; the sizing should be very similar for both.

42846-image9.png

Inside LLAP daemon: IO

Optionally, fragments may make use of LLAP cache and IO elevator (background IO threads). In HDP 2.6, it’s only supported for ORC format and isn’t supported for most ACID tables. In 3.0, support is added for text, Parquet, and ACID tables. In HDInsight, text format is also added in 2.6.

Note that queries can still run in LLAP even if they cannot use the IO layer. Each fragment would only use one IO thread at a time. Cache stores metadata (on heap in 2.6, off heap in 3.0) and encoded data (off-heap); SSD cache option is also added in 3.0 (2.6 on HDInsight).

42850-image14.png

33,463 Views