Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How LLAP allocate executors and what is "Available execution slots" in llap dashboard?

Highlighted

How LLAP allocate executors and what is "Available execution slots" in llap dashboard?

New Contributor

When view in Ambari:

Hive → Hive Dashboard (Grafana) → Hive - LLAP Daemon

I faced a problem of understanding LLap parameter in Executor metrics: "Available execution slots", how system calculates this value?

1. What I observed:

the slots number is always = llap configured executors number + 10 ??

For e.g:

if llap configured executors number (hive.llap.daemon.num.executors) is

  • 12 then Available execution slots = 22
  • 8 then Available execution slots = 18


2. I am wondering:

while LLAP is executing query, executors number reaches max value let say as configure of hive.llap.daemon.num.executors = 12 corresponding to 12 cpu cores, the allocated RAM memory is fit, for e.g =2GB * 12 = 24GB, but CPUs are still available/idle with few more cores, in this case does llap daemon sitll allocate more executors??


3. In my case:

I have config of 1 llap deamon on 1 yarn node: cpu=24 cores & RAM=48 GB with configs:

hive.tez.container.size = 2GB

hive.llap.daemon.yarn.container.mb = 43 GB

hive.llap.daemon.num.executors = 18

llap_heap_size = 36 GB

llap_headroom_space = 2.9 GB

hive.llap.io.memory.size (cache) = 3.6 GB

data input for each map task is ~ 4MB to 7MB

(I run the query in HYBRID and BI strategy with different input splitting cofigs)


It runs well when total number of tasks ~ 30 or less

But I always have my hive query killed? when:

Mapper: map tasks >40 (for e.g 42, 51 or higher)

Reducer: reduce tasks = ~9 (for e.g: 9, 13, 21, 107...)


I got this err:


Error reported by TaskScheduler
»»2:LLAP
»LLAP Daemons are running 
Vertex killed, vertexName=Map 1, vertexId=vertex_1563423751139_0014_10_00, diagnostics=

» Vertex received Kill while in RUNNING state., Vertex did not succeed due to DAG_TERMINATED, failedTasks:0 killedTasks:5, Vertex vertex_1563423751139_0014_10_00

» Map 1
killed/failed due to:DAG_TERMINATED 

Vertex killed, vertexName=Reducer 2, vertexId=vertex_1563423751139_0014_10_01, diagnostics=
» Vertex received Kill while in RUNNING state., Vertex did not succeed due to DAG_TERMINATED, failedTasks:0 killedTasks:7, Vertex vertex_1563423751139_0014_10_01

» Reducer 2
killed/failed due to:DAG_TERMINATED 
DAG did not succeed due to SERVICE_PLUGIN_ERROR. failedVertices:0 killedVertices:2

my Hive query:

-- customer_click is partitioning table with 327 partitions, size = ~ 800 MB, 26 million rows

set hive.exec.orc.split.strategy=BI; --- I try also other modes: HYBRID, ETL
set tez.grouping.min-size=16777216; -- I try other values: 33554432 (32MB), 64MB, 80 MB...

CREATE table model_click_summary as
SELECT model, count(1) as total_clicks , count(DISTINCT(date_time)) as day_count,  max(date_time) as end_day
FROM customer_click
GROUP BY model;