- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 12-01-2017 10:01 PM - edited 08-17-2019 09:56 AM
Introduction: how does LLAP fit into Hive
LLAP is a set of persistent daemons that execute fragments of Hive queries. Query execution on LLAP is very similar to Hive without LLAP, except that worker tasks run inside LLAP daemons, and not in containers.
High-level lifetime of a JDBC query:
Without LLAP | With LLAP |
Query arrives to HS2; it is parsed and compiled into “tasks” | Query arrives to HS2; it is parsed and compiled into “tasks” |
Tasks are handed over to Tez AM (query coordinator) | Tasks are handed over to Tez AM (query coordinator) |
Coordinator (AM) asks YARN for containers | Coordinator (AM) locates LLAP instances via ZK |
Coordinator (AM) pushes task attempts into containers | Coordinator (AM) pushes task attempts as fragment into LLAP |
RecordReader used to read data |
LLAP IO/cache used to
read data
or RecordReader used to read data |
Hive operators are used to process data | Hive operators are used to process data* |
Final tasks write out results into HDFS | Final tasks write out results into HDFS |
HS2 forwards rows to JDBC | HS2 forwards rows to JDBC |
* sometimes, minor LLAP-specific optimizations are possible - e.g. sharing a hash table for map join
Theoretically, a hybrid (LLAP+containers) mode is possible, but it doesn’t have advantages in most cases, so it’s rarely used (e.g.: Ambari doesn’t expose any knobs to enable this mode).
In both cases, query uses a Tez session (YARN app with aTez AM serving as a query coordinator). In container case, AM will start more containers in the same YARN app; in LLAP case, LLAP itself runs as an external, shared YARN app, so Tez session will only have one container (the query coordinator).
Inside LLAP daemon: execution
LAP daemon runs work fragments using executors. Each daemon has a number of executors to run several fragments in parallel, and a local work queue. For the Hive case, fragments are similar to task attempts – mappers and reducers. Executors essentially “replace” containers – each is used by one task at a time; the sizing should be very similar for both.
Inside LLAP daemon: IO
Optionally, fragments may make use of LLAP cache and IO elevator (background IO threads). In HDP 2.6, it’s only supported for ORC format and isn’t supported for most ACID tables. In 3.0, support is added for text, Parquet, and ACID tables. In HDInsight, text format is also added in 2.6.
Note that queries can still run in LLAP even if they cannot use the IO layer. Each fragment would only use one IO thread at a time. Cache stores metadata (on heap in 2.6, off heap in 3.0) and encoded data (off-heap); SSD cache option is also added in 3.0 (2.6 on HDInsight).