Member since
06-05-2017
18
Posts
7
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2113 | 09-29-2017 08:34 AM |
05-13-2020
10:15 PM
Thanks @kramalingam !
... View more
02-19-2020
11:59 PM
you saved my day!
... View more
03-19-2018
02:09 PM
3 Kudos
This article provides an overview of various aspects of Monitoring Hive LLAP key metrics like Hive LLAP Configurations, YARN Queue setup, YARN containers, LLAP cache hit ratio, executors, IO elevator metrics, JVM Heap usage and non-heap usage… etc Execution Engine LLAP is not an execution engine (like MapReduce or Tez). Overall execution is scheduled and monitored by an existing Hive execution engine (such as Tez) transparently over both LLAP nodes, as well as regular containers. Obviously, LLAP level of support depends on each individual execution engine (starting with Tez). MapReduce support is not planned, but other engines may be added later. Other frameworks like Pig and Spark also have the choice of using LLAP daemons. Enabling LLAP, Setting up Memory per daemon, In-Memory cache per Daemon, Number of Node(s) for running Hive LLAP daemon (num_llap_nodes_for_llap_daemons) and Number of executors per LLAP daemon in Advanced Hive Interactive-site: Cache Basics The daemon caches metadata for input files, as well as the data. The metadata and index information can be cached even for data that is not currently cached. Metadata is stored in the process in Java objects; cached data is stored and kept off-heap. Eviction policy is tuned for analytical workloads with frequent (partial) table-scans. Initially, a simple policy like LRFU is used. The policy is pluggable. Caching granularity. Column-chunks are the unit of data in the cache. This achieves a compromise between low-overhead processing and storage efficiency. The granularity of the chunks depends on the particular file format and execution engine (Vectorized Row Batch size, ORC stripe, etc.). A bloom filter is automatically created to provide Dynamic Runtime Filtering. Resource Management YARN remains responsible for the management and allocation of resources. The YARN container delegation model is used to allow the transfer of allocated resources to LLAP. To avoid the limitations of JVM memory settings, cached data is kept off-heap, as well as large buffers for processing (e.g., group by, joins). This way, the daemon can use a small amount of memory, and additional resources (i.e., CPU and memory) will be assigned based on workload. LLAP Yarn Queue It is important to know how different parameters in YARN queue configurations effects in LLAP performance. yarn.scheduler.capacity.root.llap.capacity=60
yarn.scheduler.capacity.root.llap.maximum-capacity=60
yarn.scheduler.capacity.root.llap.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.llap.ordering-policy=fifo
yarn.scheduler.capacity.root.llap.priority=1
yarn.scheduler.capacity.root.llap.state=RUNNING
yarn.scheduler.capacity.root.llap.user-limit-factor=1 Resource Manager UI Please refer to the original article for different Grafana dashboards: http://www.kartikramalingam.com/hive-llap/
... View more
Labels:
02-06-2019
02:24 PM
@Kartik Ramalingam Thank you for your wonderful and helpful post! Ranger authorization is still incorrect in the post. Ranger authorization is already enabled in initial topology. However, Step 7 must describe it to disable Ranger authorization by modifying the parameter value from XASecurePDPKnox to AclsAuthz. Also, Step 7 example needs to be corrected. Regards, Sakhuja
... View more