Member since
10-09-2015
36
Posts
77
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
412 | 08-27-2018 07:02 PM | |
476 | 01-19-2018 09:00 PM | |
1023 | 12-12-2017 01:11 AM | |
2344 | 05-30-2017 11:26 PM |
08-27-2018
07:02 PM
The functionality like that is available in Apache Hive 3.0 (bugfixes in 3.1), as well as HDP 3.0/1; see the "workload management" feature. https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/hive-workload/content/hive_workload_management.html and https://issues.apache.org/jira/browse/HIVE-17481
... View more
07-10-2018
11:18 PM
Doesn't look like the default format is kicking in. Can you create the table as ORC explicitly (stored as ORC)? Also depending on what version this is, vectorization may need to be enabled; it's good for performance anyway. Or hive.llap.io.row.wrapper.enabled needs to be set to true (if present in your version).
... View more
07-10-2018
09:55 PM
It could be counters not being propagated correctly; usually when LLAP IO is not used, this counter section is not displayed at all. You might want to check LLAP logs to see if threads with names beginning with "IO-Elevator"; these are IO threads. Also check that LLAP daemon logs have IO initialized correctly (in the beginning when the daemon starts, there should be lines about cache size, eviction policy, etc. Then, HS2 and LLAP logs may have lines from "wrapForLlap" method in HiveInputFormat indicating errors when trying to use LLAP IO. Also, is this an ACID table? ACID tables are not able to use LLAP cache until Hive 3.1 in Apache, or HDP 3.X
... View more
05-29-2018
08:03 PM
Yes, I think that should be sufficient. You can set it higher if you see issues (like container getting killed by YARN) but 12G definitely seems excessive for 32Gb Xmx. I'll try to find out why the engine would recommend such high values.
... View more
01-25-2018
10:32 PM
I think there were some classpath problems in either case. Unfortunately the servicedriver currently only packages the config (hive-site.xml) for LLAP daemon if it's present in classpath. So, it found a config outside of the classpath for its own use, but didn't package it, initially. The 2nd error seems to indicate that LlapDaemon class, and the LLAP jar itself, is not present when the daemon is started... not sure how such package was created without this jar. Copying to another directory probably added the config to classpath so servicedriver could find it.
... View more
01-24-2018
10:27 PM
It may be a bug in the way the tool looks for config files. Can you try adding /opt/hive/conf directory to class path in the script to see if that works as a workaround?
... View more
01-19-2018
09:00 PM
The general recommended heap allocation for LLAP is ~4Gb of memory /per executor/, so it depends on how many executors you have. You can see this with lots of details at https://community.hortonworks.com/articles/149486/llap-sizing-and-setup.html Sometimes depending on query size and type, it's possible to get away with as low as 1-3Gb per executor; not sure how many executors you are using so it all depends. If the values are somewhere in this range, it would be good to get heap dump on OOM and see what's taking up all the memory. There are two main possible causes - something in processing pipeline, or something in ORC writing (ORC may keep lots of buffers open when writing many files in parallel). The former is supposed to run in 4Gb/executor, so if that's the case it would be a bug and we'd like to see the top users from the heap dump 🙂 For ORC, there is unfortunately not a good formula that I know of cc @owen, but if you write more files (lots of partitions, lots of buckets) you'd need more memory. Hard to tell without knowing the details.
... View more
01-19-2018
08:54 PM
I don't think classpath modification for LLAP to add local paths is currently supported, although I can see how it could be a useful feature. Is it possible in your case to package the native library as part of the package, adding it to the aux jars? Aux jars don't actually have to be jars, they can be any files - the name is a legacy name.
... View more
12-12-2017
01:11 AM
Yeah I suspect this is where cluster is reaching capacity. Killed task attempts are probably composed of two types: rejected task attempts because LLAP daemon is full and won't accept new work; and killed opportunistic non-finishable tasks (preemption). The latter happen because Hive starts some tasks (esp. reducers) before all inputs for them are ready, to be able to download the inputs from some upstream tasks while waiting for other upstream tasks to finish. When parallel queries want to run a task that can run and finish immediately, they would pre-empt non-finishable tasks (otherwise a task that is potentially doing nothing and waiting for something else to finish could take resouces from the tasks that are ready). This is normal with high volume concurrent queries that amount of preemption increases. The only way to check if there are any other (potentially problematic) kills now is to check the logs... If cache is not as important for these queries you can try to reducehive.llap.task.scheduler.locality.delay, which may cause faster scheduling for tasks (-1 means infinite, the minimum otherwise is 0).. However, once the cluster is at capacity, it's not realistic to expect
sub-linear scaling... individual query runtime improvements would also
improve aggregate runtime in this case.
... View more
12-08-2017
07:56 PM
If 6 queries take 4 seconds, scaling linearly would mean 50 queries should take ~33 seconds if the cluster was at capacity. I'm assuming there's also some slack, however unless queries are literally 2-3 tasks, that probably means at some point the cluster reaches capacity and the tasks from different queries start queueing up (even if they do run faster due to cache, JIT, etc.). Are 54 executors per machine, or total?By 50% capacity do you mean 4 LLAP with 50% of a node, or 2 LLAPs with full node? In absence of other information 13sec (which is much better than linear scaling) actually looks pretty good. One would need to look at query details and especially cluster utilization with different number of queries to determine if one needs parallel query improvement, or (which is more likely) single-query improvement because the tasks are simply waiting for other tasks with cluster at capacity. Also, 4 queries by default is just a reasonable default for most people starting out; parallel query capacity actually requires some resources (to have AMs present; otherwise AM startup will affect perf a lot), so we chose a value on the low end. As for memory usage per query, the counters at the end and in Tez UI output cache/IO memory used, however Java memory use by execution is not tracked well right now.
... View more
12-07-2017
07:37 PM
In the above log, it looks like there are two xmx values on the commandline: -Xmx22251m and -Xmx33251m. Do you know where the 2nd value comes from? Was one of them specified via args? I'm not sure which one would apply (it would be logged in jmx view of the LLAP daemon), but if the limit is 27Gb and the 2nd value applies then this is the reason for the container to exceed memory.
... View more
12-05-2017
09:35 PM
1 Kudo
LLAP caches file identification when storing data. For HDFS, that is by default the file inode ID; for other FS (e.g. s3), it's the combination of name, size, and modification timestamp. HDFS inode ID (and other FSs time and size) change on appends, so the cache data gets invalidated and cached with a different ID. The old data is currently not proactively removed, but it is no longer used by queries, and will eventually be evicted. However, for ORC/Parquet/etc. files generally the append pattern is not used - the file is sealed once it is written. For example, Hive ACID would write new files (deltas) and then compact the old base file and the deltas into a new base file. In this case, the new files' data is cached with a new ID.
... View more
12-05-2017
09:31 PM
Hi, this version should already have the fix for this issu (HIVE-15779). As of 2.6.X, there is no workload management and balancing functionality in LLAP, so there are indeed usually a lot of cancellations reported for parallel queries. Some of them are a reporting issue (rejected tasks are expected when AM is probing LLAPs for empty space - this is planned to be fixed in 3.0). Still, if I understand correctly, going from 78s to 129s completion means running 2 queries in parallel is faster than sequentially, assuming the cluster if fully utilized it is a good result; both queries get ~half the allocation, and thus effectively run with half the cluster size; hence they take longer. Is this the case? In 3.0, we are implementing workload management for LLAP for better resource sharing, and the ability to specify policies for queries (sharing the cluster fairly, or in some proportions, etc.)
... View more
12-05-2017
09:19 PM
Note that the commandline has two Xmx values... Ambari has a config value for LLAP Xmx that it adds to the command line; that is the recommended way to set xmx. The custom value(s) added to the args seem to be conflicting to what Ambari adds.
... View more
12-05-2017
09:16 PM
Just a note - on older versions of HDP (2.6.1 and below iirc) it is possible to receive InvalidACL at start time because the LLAP application has failed to start and thus failed to create the path entirely. So, it might be worth checking the LLAP app log if the path does not exist.
... View more
12-04-2017
08:16 PM
LLAP app has failed to start; application_1512395880314_0027 app logs might have a real error, if any. It's also possible that the containers failed to start because either there isn't enough memory physically on the cluster, or there isn't enough space configured in the YARN queue being used.
... View more
12-02-2017
03:33 AM
7 Kudos
Finding out the hit rate It is possible to determine cache hit rate per node or per query. Per node, you can see hit rate by looking at LLAP metrics view (<llap node>:15002, see General debugging): Per query, it is possible to see LLAP IO counters (including hit rate) upon running the query by setting hive.tez.exec.print.summary=true, which should produce the counters output at the end of the query, for example - empty cache: Some data in cache: Why is the cache hit rate low Consider the data size and the cache size across all nodes. E.g. with a 10 Gb cache on each of the 4 LLAP nodes, reading 1 Tb of data cannot achieve more than ~4% cache hit rate even in the perfect case. In practice the rate will be lower due to effects of compression (cache is not snappy/gzip compressed, only encoded), interference from other queries, locality misses, etc. In HDP 2.X/Hive 2.X, cache has coarser granularity to avoid some fragmentation issues that are resolved in HDP 3.0/Hive 3.0. This can cause considerable wasted memory in cache on some workload, esp. if the table has a lot of strings with a small range of values, and/or is written with smaller compression buffer sizes than 256Kb. When writing data, you might consider ensuring that ORC compression buffer size is set to 256Kb, and set hive.exec.orc.buffer.size.enforce=true (on HDP 2.6, it requires a backport) to disable writing smaller CBs. This issue doesn't result in errors but can make cache less efficient. If the cache size seems sufficient, check relative data and metadata hit rates (see above screenshots). If there are both data and metadata misses, it can be due to other queries caching different data in place of this queries data, or it could be a locality issue. Check hive.llap.task.scheduler.locality.delay; it can be increased (or set to -1 for infinite delay) to get better locality at the cost of waiting longer to launch tasks, if IO is a bottleneck. If metadata hit rate is very high but data is lower, it is likely that cache doesn't fit all the data; so, some data gets evicted, but metadata that is cached with high priority stays in cache.
... View more
- Find more articles tagged with:
- Data Processing
- Debugging
- help
- Hive
- How-ToTutorial
- llap
- performance
Labels:
12-02-2017
03:21 AM
3 Kudos
This article is a short summary of LLAP-specific causes of query slowness. Given that a lot of the Hive query execution (compilation, almost all operator logic) stay the same when LLAP is used, queries can be slow for non-LLAP-related reasons; general Hive query performance issues (e.g. bad plan, skew, poor partitioning, slow fs, ...) should also be considered when investigating. These issues are outside of the scope of this article. Queries are slow
On HDP 2.5.3/Hive 2.2 and below, there’s a known issue where queries on LLAP can get slower over time on some workloads. Upgrade to HDP 2.6.X/Hive 2.3 or higher. If you are comparing LLAP against containers on the same cluster, check LLAP cluster size compared to the capacity used for containers. If there are 27 nodes for containers, and a 3-node LLAP cluster, it’s possible containers will be faster because they can use many more CPUs/memory. Generally, queries on LLAP run just like in Hive (see Architecture). So, the standard Hive performance debugging should be performed, starting with explain, looking at Tez view DAG timeline, etc. to narrow it down. Query is stuck (not just slow)
Make sure hive.llap.daemon.task.scheduler.enable.preemption is set to true. Look at Tez view to see which tasks are running. Choose one running task (esp. if there’s only one), go to <its llap node>:15002/jmx view, and search for ExecutorsStatus for this task.
If the task is missing or says “queued”, it may be a priority inversion bug. A number were fixed over time in HDP 2.6.X. If the task is running, it is possible that this is a general query performance issue (e.g. skew, bad plan, etc.). You can double check <llap node>:15002/stacks to see if it’s running code.
... View more
- Find more articles tagged with:
- Data Processing
- Debugging
- Hive
- How-ToTutorial
- llap
- performance
Labels:
12-02-2017
02:57 AM
3 Kudos
This article will (hopefully) be helpful if you are trying to use LLAP/Hive Interactive, but it fails or gets stuck during startup. This is often caused by misconfiguration, esp. w.r.t. sizing, but could also be a bug. Ambari performs 2 major activities when (re)starting Hive Interactive - first it starts LLAP itself on the YARN cluster, then it starts HiveServer2 for Hive Interactive. Usually the problems happen during LLAP startup. This article will help you to pinpoint the problem if there's some bug or non-sizing-related misconfiguration. It will also point at the sizing document where appropriate; the sizing and setup document can be found at https://community.hortonworks.com/articles/149486/llap-sizing-and-setup.html. For general information (e.g. where the logs are) see the parent article. Note: for all the logs, it is sometimes the case that the last error in the log is InterruptedException or some other similar error during component shutdown after something else has already failed. It might help to look at all the errors at the end of the log if there are several; esp. the initial exception. Also please capture all the callstacks (or attach the entire log) if opening a bug. Look at Ambari operation logs for HiveServer Interactive startup. If it has an error that is basically a timeout waiting for LLAP cluster, go to step 2 to determine why LLAP wasn’t able to start. This may also manifest as InvalidACL error (on an older HDP version). If the issue is after LLAP has started (i.e. in HiveServer2 startup) and there's no useful error message, see hiveServer2Interactive logs on the HiveServer2 Interactive service machine. Try to make sense of the error, and/or file a bug or contact someone in dev (Hive). Otherwise, try to make sense of the error, and/or file a bug or contact someone in dev (Ambari or Hive depending on the error). Check that the LLAP YARN app has enough capacity via ResourceManager UI, assuming it hasn't just failed (skip this step if the app is in failed state). If the app is neither running nor complete (i.e. it’s queued, accepted, etc.), capacity is missing in the cluster to start an app; check what takes up space on the cluster; if nothing looks unexpected, consult the sizing doc to ensure LLAP is set up properly. If the app is running (not failed), go to “Tracking URL: Application Master” link, and check if it can start enough LLAP daemons on the cluster (Desired vs Actual containers for LLAP component). If not, check if there’s enough capacity to start this number of containers – something else might be taking up the space; if nothing looks unexpected, consult the sizing doc to ensure LLAP is set up properly. If the app has failed, or there is enough capacity but the app doesn’t have enough containers, view the logs in the UI (or kill the app and download YARN logs). If the slider app status says “too many component failures” or a similar error, go directly to step 5; otherwise step 4. Check slider.log for errors (in the UI, it would be in the slider AM – usually the lowest numbered container). If the errors are container failures or there are no errors, go to step 5. Otherwise, try to make sense of the error, and/or file a bug or contact someone in dev (YARN or Hive depending on the error). If present, check daemon logs in one of the failed containers for errors. Start with (llap-daemon*.log, llap-daemon*.out). They would usually have the error at the end. Try to make sense of the error, and/or contact Hive team. If those are not present, check slider-agent.log instead; try to make sense of the error, and/or file a bug or contact someone in dev (YARN or Hive depending on the error).
... View more
- Find more articles tagged with:
- Data Processing
- Debugging
- Hive
- How-ToTutorial
- Issue Resolution
- llap
- startup
Labels:
12-01-2017
10:42 PM
8 Kudos
This article provides an overview of various debugging aspects of the system when using LLAP - how are things run, where logs are located, what monitoring options are available, etc. See the parent article for architecture overview and specific issue resolution. HS2 when using LLAP
When using LLAP (“Hive interactive”), Ambari starts a separate
HS2 (HS2 Interactive). This HS2 is not connected to the other HS2, and has a
separate connection string, machine (or
ports), configuration and logs
.
HS2 interactive and LLAP
configs
in Ambari are split similarly to regular Hive configs, but have
“-interactive-“ in their name.
HS2 interactive logs
are called “hiveServer2Interactive” and are separate from regular HS2 logs. YARN apps when using LLAP; getting query logs
Ambari starts LLAP as a
YARN app that is shared by everyone using the cluster. It should have N+1
containers, where N is the number of LLAP daemons, and there’s slider AM. LLAP
YARN app is called “llap0” by default: Tez session YARN apps will always have one container (the query
coordinator) when LLAP is used.
In the below screenshot, LLAP cluster app and 2
single-container Tez sessions can be seen (this would be the case if 2
concurrent queries were configured in Ambari, see Cluster sizing).
Because of the YARN deployment model, LLAP does not use
jars/configs/etc. on local machine.They
are packaged by Ambari at LLAP startup time, and are deployed in the container
directories of the LLAP YARN app. In 2.6, the package with the jars and configs
is rebuilt on every LLAP restart. Logs
LLAP YARN app contains
task logs for all the queries
; each LLAP writes a separate log file for
every query. The name format of the log file containing the task logs for a
single query on a single daemonis
"
<hive query Id>-<DAG name>.log(.done)", e.g.
hive_20171110190337_9cd58b54-62e0-4940-b0b8-e8b62f877cbb-dag_1505982789269_1038_1.log.Every daemon would have one log file for each
(or almost every) query. Over time, the log files for completed queries (with
.done extension) are picked up by YARN log aggregation and removed from
daemons' YARN directories.
DAG name in the log
file name corresponds to a Tez session app – e.g. in the above case, dag_1505982789269_
1038_1 corresponds to query #1 in session represented by application_1505982789269_1038.This
latter app (which is separate from the LLAP app) is the query coordinator (Tez
AM)
for the query and contains the Tez
AM logs
for it. Slider commands and views; LLAP app status overview
Ambari uses Slider to launch LLAP. When debugging, it’s possible
to use slider commands to affect the app without involving Ambari, e.g. “slider stop <name>” to stop the
LLAP app by name (slider stop
llap0). See “slider help” for more commands.
Slider AM also exposes a status view, accessed from the YARN
application page:
The status view shows allocated and running
containers, slider debugging info and also container logging directories on
each machine, that can be used for manual debugging if there’s a problem with
“yarn logs” command:
LLAP daemon status and metrics On each machine where LLAP is running, it exposes a number of
endpoints on port 15002 (by default), i.e.
http://(machine name):15002/ The default view shows
some metrics; other useful views are:
/jmx
– shows various useful metrics, discussed below
/stacks
- shows daemon stacks – threads that process query fragments are TezTR-*and
IO-Elevator-Thread*
/conf
– shows LLAP configuration (the values actually in use)
/iomem (only in 3.0)
– shows debug information about cache contents and usage
/peers
– shows all LLAP nodes in the cluster
/conflog
(only in 3.0) – allows one to see and add/change logger and log level for this
daemon without restart
LLAP daemon metrics
(JMX)
At <host>:15002/jmx, one can see a lot of LLAP daemon
metrics. Things to pay attention to:
ExecutorsStatus contains the information
about tasks queued and executing on LLAP. This is the most useful section when
debugging stuck queries to see which tasks are running and in queue.
ExecutorMemoryPerInstance, IoMemoryPerInstance,
MaxJvmMemory contain the memory configuration in use (see the sizing
section).
java.class.path is
the LLAP daemon classpath; for debugging class-loading issues.
llap.daemon.log.dir is the LLAP daemon
logging directory.
LastGcInfo contains the information about
last GC (more is available in the GC log in the logging directory).
LlapDaemonIOMetrics, LlapDaemonCacheMetrics,
LlapDaemonExecutorMetrics, LlapDaemonJvmMetrics contain LLAP performance
metrics. The latter is especially useful when debugging limit issues.
Tez view and performance debugging
Tez view can be used to view Hive queries with LLAP, like the
regular Hive queries; it has tabs with DAG representation, vertex swimlanes,
etc. for each query.
In particular, one can download
debugging and performance data for the query.
Grafana views
LLAP exposes metrics to Ambari Timeline Server which can be
viewed via Grafana Dashboards. Grafana dashboards can be viewed from Ambari via
Hive -> Quick Links -> Hive
Dashboard (Grafana)
. This should open up Grafana dashboard for all HDP
components. Click on “Home” dropdown to navigate to any of the 3 LLAP
dashboards
LLAP Daemon Dashboard snapshot that showing host level LLAP
executor metrics.
LLAP Overview Dashboard snapshot showing aggregated overview of
memory from all nodes of LLAP cluster.
LLAP Heatmaps Dashboard snapshot showing host level cache and
executor heatmaps.
There are several other metrics exposed via Grafana that can be
useful for debugging. Full list of metrics and their description are available
here
https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-operations/content/grafana_hive_llap_dashboards.html
... View more
- Find more articles tagged with:
- Data Processing
- Debugging
- Hive
- How-ToTutorial
- llap
Labels:
12-01-2017
10:01 PM
11 Kudos
Introduction: how does LLAP fit into Hive
LLAP is a set of persistent daemons that execute fragments of
Hive queries. Query execution on LLAP is very similar to Hive without LLAP,
except that worker tasks run inside LLAP daemons, and not in containers.
High-level lifetime of a JDBC query:
Without LLAP
With LLAP
Query arrives to HS2;
it is parsed and compiled into “tasks”
Query arrives to HS2;
it is parsed and compiled into “tasks”
Tasks are handed over
to Tez AM (query coordinator)
Tasks are handed over
to Tez AM (query coordinator)
Coordinator (AM) asks
YARN for containers
Coordinator (AM)
locates LLAP instances via ZK
Coordinator (AM)
pushes task attempts into containers
Coordinator (AM)
pushes task attempts as fragment into LLAP
RecordReader used to
read data
LLAP IO/cache used to
read data
or RecordReader used to read data
Hive operators are
used to process data
Hive operators are
used to process data*
Final tasks write out
results into HDFS
Final tasks write out
results into HDFS
HS2 forwards rows to
JDBC
HS2 forwards rows to
JDBC
* sometimes, minor LLAP-specific optimizations are possible - e.g. sharing a hash table for map join
Theoretically, a hybrid (LLAP+containers) mode is possible, but
it doesn’t have advantages in most cases, so it’s rarely used (e.g.: Ambari
doesn’t expose any knobs to enable this mode).
In both cases, query uses a Tez session (YARN app with aTez AM serving as a query coordinator). In
container case, AM will start more containers in the same YARN app; in LLAP
case, LLAP itself runs as an external, shared YARN app, so Tez session will only
have one container (the query coordinator).
Inside LLAP daemon: execution
LAP daemon runs work
fragments
using executors. Each daemon has a
number of executors to run several fragments in parallel, and a local work
queue. For the Hive case, fragments are similar to task attempts – mappers and
reducers. Executors essentially “replace” containers – each is used by one task
at a time; the sizing should be very similar for both.
Inside LLAP daemon: IO
Optionally, fragments may make use of LLAP cache and IO
elevator (background IO threads). In HDP 2.6, it’s only supported for ORC
format and isn’t supported for most ACID tables. In 3.0, support is added for
text, Parquet, and ACID tables. In HDInsight, text format is also added in 2.6.
Note that
queries can
still run in LLAP even if they cannot use the IO layer.
Each fragment would
only use one IO thread at a time. Cache stores metadata (on heap in 2.6, off
heap in 3.0) and encoded data (off-heap); SSD cache option is also added in 3.0
(2.6 on HDInsight).
... View more
- Find more articles tagged with:
- architecture
- Debugging
- FAQ
- Hive
- llap
- Sandbox & Learning
Labels:
12-01-2017
09:44 PM
8 Kudos
This is the central page that links to other LLAP debugging articles. You can find articles dealing with specific problems below; to provide some background and to help resolve other issues, there are also some helpful general-purpose articles: A one-page overview of LLAP architecture: https://community.hortonworks.com/articles/149894/llap-a-one-page-architecture-overview.html On the general debugging - where to find logs, UIs, metrics, etc.: https://community.hortonworks.com/articles/149896/llap-debugging-overview-logs-uis-etc.html You might also find this non-debugging LLAP sizing and setup guide interesting:
https://community.hortonworks.com/articles/149486/llap-sizing-and-setup.html Before getting to specific issues, there are some limitations with LLAP that you should be aware of; the following significant features are not supported:
HA
(work in progress). doAs=true
(no current plans to support it). Temporary
functions (will not be supported; use permanent functions). CLI
access (use beeline). Then, there are some articles about specific issues: LLAP doesn't start - https://community.hortonworks.com/articles/149899/investigating-when-llap-doesnt-start.html Queries are slow or stuck on LLAP - https://community.hortonworks.com/articles/149900/investigating-when-the-queries-on-llap-are-slow-or.html Investigating cache usage - https://community.hortonworks.com/articles/149901/investigating-llap-cache-hit-rate.html This list will be expanded in future with the issues that people are facing with LLAP.
... View more
- Find more articles tagged with:
- Data Processing
- Debugging
- Hive
- How-ToTutorial
- llap
Labels:
11-29-2017
09:05 PM
24 Kudos
I'd like to thank @Jean-Philippe Player, @bpreachuk, @ghagleitner, @gopal, @ndembla and @Prasanth Jayachandran for providing input and content for this article. Introduction
This document describes LLAP setup for reasonable performance with a typical workload. It is intended as a
starting point, not as the definitive answer to all tuning questions. However,
recommendations in this document have been applied both to our own instances as
well as instances we help other teams to run. The context for the examples
Each step below will be accompanied by an example based on
allocating ~50% of a cluster to LLAP, out of 12 nodes with 16 cores and 96Gb
dedicated to YARN NodeManager each. Basic cluster setup Pick the fraction of the cluster to be used by
LLAP
This is determined by the workload (amount of data,
concurrency, types of queries, desired latencies); typical values can be in
15-50% range on an existing cluster, or it is possible to give entire cluster
to LLAP, depending on customer use case.
LLAP runs 3 types of YARN containers in the cluster:
a) Execution daemons which process the
data
b) query coordinators (Tez AMs)
orchestrating the execution of the queriues
and c) a single Slider AM starting and
monitoring the daemons.
The bulk of the capacity allotted for LLAP will be used by the
Execution daemons. For best results, whole YARN nodes should be allocated for
these daemons; therefore, you should
pick
the number of whole nodes
to use based on the fraction of the cluster to be dedicated to LLAP.
Example: we want to
use 50% of a 12-node the cluster, so we will have 6 LLAP daemons.
Pick the query parallelism and number of coordinators
This will depend heavily on the application; one thing to be
aware of is that with BI tools, the number of running queries at any moment in
time is typically much smaller than the number of sessions -users don’t run queries all the time because
they take time to look at the results and some tools avoid queries using
caching.
The number of queries determines the number of query coordinators (Tez AMs) - one per query + some constant
slack
in case sessions are temporarily unavailable due to maintenance or
cluster issues.
Example: we have
determined we need to run 9 queries in parallel; use 10 AMs.
Determine LLAP daemon size
As described above, we will give whole NM nodes to LLAP
daemons; however, we also need to leave some space on the cluster for query
coordinators and the slider AM.
There are two options. First, it is possible to run these on
non-LLAP nodes. In this case, each
LLAP
daemon can take the entire memory of a NM
(see YARN config tab in Ambari,
as "Memory allocated for all YARN
containers on a node". 😞
If that is acceptable, it's the optimal approach (it maximizes daemon memory and avoids noisy neighbors). However,
this will use more combined resources than just the
(number of LLAP nodes)/(total number of nodes), because AMs will take resources
on other nodes.
it's impossible if all NMs are running LLAP daemons
(~100% LLAP cluster), and risky even when there are 1-2 non-LLAP NMs (because
all the query coordinators would be concentrated on one node).
The alternative approach is to run AMs on the nodes with LLAP
daemons.
In this case, LLAP daemon
container size needs to be adjusted down
to leave space:
There are (N_Tez_AMs + 1)/N_LLAP_nodes extra processes on each
node; we recommend 2Gb per AM process. So,
for
this case, LLAP daemon will use (NM_memory – 2Gb * ceiling((N_Tez_AMs +
1)/N_LLAP_nodes))
.
Example: We will use
the 2nd approach; we need ceiling((10+1)/6) = 2 AM processes per LLAP node,
reserving 2 * 2Gb per node. Therefore, LLAP daemon size will be 96Gb – 4Gb =
92Gb. Set up YARN and the LLAP queue
YARN settings for LLAP, as well as the LLAP queue are usually
set up by Ambari; if setting up manually or encountering problems, there are a
few things to ensure:
YARN preemption
should be enabled.
Even if there is enough memory in the cluster for
execution daemons, it might be too fragmented for Yarn to allocate whole nodes
to LLAP. Preemption ensures that Yarn can free up memory and start LLAP,
without it startup might not succeed.
Minimum YARN
container size should be 1024Mb and maximum should be the total memory per NM.
LLAP
daemon is a container so it will not be able to use more memory than that.
When setting up a custom LLAP queue:
Set the queue
size >= (2Gb * N_TEZ_AMs) + (1Gb [slider AM]) + (daemon_size * N_LLAP_nodes)
.
Ensure the queue
has higher priority
that other YARN queues (for preemption to work; see the
above YARN setup section).
Make sure queue min and max capacity is the same (full
capacity), and there are no user limits on the queue. Determine the LLAP daemon CPU & memory
configuration
Overview
There are three numbers that determine the memory configuration
for LLAP daemons.
Total memory: The container size that was determined
above
Xmx: Memory used by executors, and
Off-heap cache memory.
Total memory = off-heap
cache memory + Xmx (used by executors and daemon-specific overhead) + JVM
overhead
. The Xmx overhead (shared Java objects and internal structures) is
accounted for by LLAP internally, so it can be ignored in the calculations.
In most cases, you can
just follow the standard steps below
; however, sometimes the process needs
to be tweaked. The following conflicting criteria apply:
It's good to maximize task parallelism (up to one executor per core).
It's good to have at least 4 Gb RAM per executor. More
memory can provide some diminishing returns; less can sometimes be acceptable
but can lead to problems (OOMs, etc.).
It's good to have IO layer enabled, which requires some
cache space; more cache may be useful depending on workload, esp. on a cloud FS
for BI queries.
So, number of executors, cache size, and then memory per
executor are balance of these 3: Pick the
number of executors per node
Usually the same as the number of cores.
Example: there are
16 cores, so we will use 16 executors. Determine the Xmx and minimum cache
memory needed by executors
Daemon uses almost all of its heap memory (Xmx) for processing
query work; the shared overheads are accounted for by LLAP. Thus, we determine
the daemon Xmx based on executor memory; you should aim to give a single
executor around 4Gb. So,
Xmx =
n_executors * 4Gb. LLAP executors use memory for things like group by tables, sort buffers, shuffle buffers, map join hashtables, PTF, etc.; generally, the more memory, the less likely they are to spill, or to go with a less efficient plan, e.g. switching from a map to shuffle join - for some background, see https://community.hortonworks.com/articles/149998/map-join-memory-sizing-for-llap.html
If there isn't enough
memory on the machine, by default, adjust the
number of executors down. It is possible to give executors less
memory (e.g. 3Gb) to keep their number higher, however, that must always be
backed by testing with the target workload. Insufficient memory can cause much
bigger perf problems than reduced task parallelism.
Additionally, if IO layer is enabled, each executor would need about ~200Mb of cache (cache buffers are
also used as IO buffer pool) for reasonable operation. The cache size will be
covered in the next section.
Example: Xmx = 16
executors * 4 Gb = 64 Gb
Additionally, based on 16 executors, we know that it's good to have (0.2 Gb *
16) = 3.2 Gb cache if we are going to enable it. Determine headroom and cache size
We need to leave some memory in the daemon for Java VM overhead
(GC structures, etc.) as well as to give the container a little safety margin
to prevent it being killed by YARN. This
headroom should be about 6% of Xmx, but no more than 6 Gb
. This is not set
directly and is instead used to calculate the container memory remaining for
the use by off-heap cache.
Take out Xmx and the
headroom from the total container size to determine cache size.
If the cache size is below the viable size
above, reduce Xmx
to allow for more memory here (typically by reducing the number
of executors)
or disable LLAP IO (hive.llap.io.enabled = false).
Example:
Xmx is 64 Gb, so
headroom = 0.06 * 64 Gb = ~4 Gb (which is lower than 6 Gb).
Cache size = 92 Gb
total – 64 Gb Xmx – 4 Gb headroom = 24 Gb. This is above the minimum of 3.2 Gb. Configure interactive query
You can specify the above settings on the main interactive
query panel, as well as in advanced config (or validate the values
predetermined by Ambari).
Some other settings from the advanced config that may be of
interest:
AM size can be modified in custom Hive config by
changing tez.am.resource.memory.mb.
... View more
- Find more articles tagged with:
- configuration
- Data Processing
- Hive
- How-ToTutorial
- llap
Labels:
05-30-2017
11:26 PM
2 Kudos
Looks like hive.llap.io.threadpool.size is set to an invalid value. Are you using Ambari? Sometimes, depending on the cluster machine config, it may happen. You may need to set it manually in the advanced LLAP config in Hive configuration. The recommended default value is the same as number of executors. Is it also set to 0? If so, that would also need to be adjusted (typically to # of CPUs or based on memory, to allow 3-4-8Gb per executor). What version of Ambari is this?
... View more
05-25-2017
09:19 PM
Looks like some code is trying to retrieve foreign keys from Hive (GetCrossReference),
while supplying a null database name. Not sure why it happens when
using LLAP in particular. Are you sure there isn't something special
about the tool/the pattern of use that could cause this behavior?
... View more
03-09-2017
09:21 PM
2 Kudos
The directory for the database is mostly used to store table data, so
there isn't a lot of difference if you are only going to use external
tables. Is there any reason you want to use a different directory?
... View more
03-09-2017
08:28 PM
5 Kudos
In HDP 2.5 and 2.6, Hive server interactive and HiveServer2 are configured as two separate services, where HS2 retains all the same functionality as before, but HSI has some limitations (e.g. it doesn't support HA). Tthis may change in HDP 3.0, esp. limitation-wise. HSI requires one to allocate some fraction of the cluster to interactive queries, for LLAP, i.e. the permanently active daemons that will execute interactive workloads. This knob controls the balance between interactive and other queries. In HDP 2.5 and 2.6.0, we do not enforce the usage of HSI, so one can in theory use it for ETL queries too, but that is not recommended; we may also block non-LLAP Tez queries from it in future. The general recommendation is thus to divide the cluster into the interactive and non-interactive portion, run interactive queries via the single HSI, and run other queries via HS2 using the old recommendations. Many queries that are not interactive per se (even the ETL queries), however, can benefit from LLAP; so it is also recommended to try running queries on LLAP first before consigning them to the non-interactive cluster.
... View more
03-09-2017
07:51 PM
5 Kudos
Hi, at this point there are no plans to support storage-based authorizatrion with LLAP; the suggested approach is to use Ranger for auth. Please file Apache JIRA, if there's enough support perhaps there will be a discussion in the community.
... View more