Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (3)
avatar

In this article, we cover the following: 

  • Demonstrate how hive sessions work
  • Clarify some misunderstanding about hive sessions behavior (most of the people, including myself until recently, believe hive sessions and queue allocation work different than the way they actually work)
  • Propose a solution for small-medium environments (up to 200 nodes) that allows concurrent users + multi-tenancy + soft allocation + preemption

 

Assumptions:

Tested with HDP 2.4.0

I'm using configuration suggested in best-practices and docs related to security and hive performance guide, which include following configurations:

  • hive engine (hive.execution.engine) = tez
  • hive do-as (hive.server2.enable.doAs) = false
  • hive default queues (hive.server2.tez.default.queues) = (queue-name1,queue-name2,etc)
  • hive number of sessions (hive.server2.tez.sessions.per.default.queue) = 1 (or up to 4)
  • hive start sessions (hive.server2.tez.initialize.default.sessions) = true
  • hive pre-warm containers (hive.prewarm.enabled) = true
  • hive num-containers (hive.prewarm.numcontainers) = (some number >1)
  • tez container reuse (tez.am.container.reuse.enabled) = true
  • all examples here related to hiveserver2 + jdbc client (beeline/other jdbc client or odbc driver), it does not apply with hive cli or other tools that interact directly with hivemetastore

Understanding #1 - default queues + number of sessions:

When you define hive default queues (hive.server2.tez.default.queues), hive number of sessions (hive.server2.tez.sessions.per.default.queue) and hive start sessions (hive.server2.tez.initialize.default.sessions) = true, hiveserver2 will create one Tez AM (application master) for each default queue x number of sessions when hiveserver2 service starts.

 

For example, if you define default queues = "queue1, queue2" and number of sessions = 2, hiveserver2 will create 4 Tez AM (2 for queue1 and 2 for queue2).

 

If you have continuous usage of hive server2, those Tez AM will keep running, but if your hiveserver2 is idle, those Tez AM will be killed based on timeout defined by tez.session.am.dag.submit.timeout.secs.

 

Understanding #2 - pre-warm containers

The pre-warm container is different from session initialization.

 

The number of containers is related to the amount of YARN execution containers (not considering AM container) that will be attached to each Tez AM by default. This same number of containers will be held by each AM, even when Tez AM is idle (not executing queries).

 

Understanding #3 - when initialized Tez AM (AM pool) is used

A query WILL ONLY use a Tez AM from the pool (initialized as described above) if you DO NOT specify queue name (tez.queue.name), in this case, Hiveserver2 will pick one of Tez AM idle/available (queue name here may be randomly selected). If you execute multiple queries using the same JDBC/ODBC connection, each query may be executed by different Tez AM and it can also be executed in different queue names.

If you specify queue name in your JDBC/OBDBC connection, hiveserver2 will create a new Tez AM for that connection and won't use any of the initialized sessions.

 

PS: only JDBC/ODBC clients and hive CLI (command line) can use initialized Tez AM. 

 

Understanding #4 - what if you have more concurrent queries than the number of Tez AM initialized?

If you DO NOT specify queue name (as described above) hiveserver2 will hold your query until one of the default Tez AM (pool) is available to serve your query. There won't be any message in JDBC/OBDC client neither in the hiveserver2 log file. An uninformed user (or admin) may think JDBC/ODBC connection or hiveserver2 is broken, but it's just waiting for a Tez AM to execute the query.

 

If you DO specify queue name, it doesn't matter how many initialized Tez AM are in use or idle, hiveserver2 will create a new Tez AM for this connection and the query can be executed (if the queue has available resources).

 

Understanding #5 - how to make a custom queue name (defined by the user) to use AM pool

There is no way to allow users to specify queue names and at the same time use Tez AM pool (initialized sessions). If your use case requires different/dedicated Tez AM pool for each group of users, you need dedicated hiveserver2 services, each of them with respective default queue name and number of sessions, and ask each group of users to use respective hiveserver2.

 

Understanding #6 - number of containers for a single query

A number of containers each query will use are defined here ( https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works), which consider a number of resources available on the current queue, the number of resources available in a queue is defined by the minimum guaranteed capacity (yarn.scheduler.capacity.root._queuename_.capacity) and not maximum capacity (yarn.scheduler.capacity.root._queuename_.maximum-capacity).

 

In other words, if you are executing a heavy query, this query will probably use all the resources available in the queue, leaving no room for other queries to execute.

 

Understanding #7 - capacity scheduler ordering policy (FIFO or FAIR) and preemption

There is no preemption inside queues and FAIR ordering policy is only effective when containers are released by AM. If you are reusing containers (tez.am.container.reuse.enabled=true) as recommended, containers will only be released by AM when query finishes, if the first query being executed is heavy, all subsequent queries will wait for the first query to finish to receive the "fair" share.

 

Understanding #8 - a scenario where it all works together

My understanding is that all these pieces and settings were initially designed and very well optimized to work in the following scenario:

 

  1. Large Cluster with lots of resources dedicated to queries.
  2. High number of concurrent queries/users.
  3. Multiple hiveserver2 instances, for example, one for light queries, other for heavy queries.
  4. Fixed number of sessions (Tez AM) and fixed number of containers for each session, all of them pre-warmed (these numbers can be different for each type of query / hiveserver2, as described in #3).
  5. No need for FAIR ordering policy or preemption because all the queries always use only the pre-warmed containers

 

Small/Medium cluster most common needs

Although the scenario described in item #8 is very interesting (and hive+tez is the best solution to run at that scale), it is not the reality in most of the small-medium Hadoop clusters, usually found in Hadoop adoption phase and POCs.

In my experience, what most of the users of such environments need is much simpler, something like this:

  1. Multi-tenancy cluster, let's say Production and User layers sharing resources.
  2. Soft limits and preemption (at night, when users are off, give 100% of resources to production, during the day keep production running with minimum allocation to accomplish SLA, but give most of the power to users).
  3. Fair allocation (if just one user is running a query, give him all the resources, as soon as second, third and subsequent users start executing queries, give them equal amount of resources distribution, but also respect different queues minimum shares, that represents priorities).

Proposed solution / settings

 

Hive+Tez is the perfect solution for both scenarios described above, but now, I'm going to share details on how to adjust setttings for the "Small/Medium cluster most common needs":

 

  1.  YARN - Enable preemption as described here. My settings for YARN are:
    yarn.resourcemanager.scheduler.monitor.enable=trueyarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity=0.01
    yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill=1000
    yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval=1000
    yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor=1
    yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round=1
    yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy​
  2. Capacity Scheduler - Create one queue for each concurrent session, for example, if you want 3 concurrent users, create 3 queues (this is needed to achieve full preemption):
    yarn.scheduler.capacity.maximum-am-resource-percent=0.2
    yarn.scheduler.capacity.maximum-applications=10000
    yarn.scheduler.capacity.node-locality-delay=40
    yarn.scheduler.capacity.queue-mappings-override.enable=false
    yarn.scheduler.capacity.root.accessible-node-labels=*
    yarn.scheduler.capacity.root.acl_administer_queue=*
    yarn.scheduler.capacity.root.capacity=100
    yarn.scheduler.capacity.root.default.acl_submit_applications=*
    yarn.scheduler.capacity.root.default.capacity=30
    yarn.scheduler.capacity.root.default.maximum-capacity=100
    yarn.scheduler.capacity.root.default.state=RUNNING
    yarn.scheduler.capacity.root.default.user-limit-factor=10
    yarn.scheduler.capacity.root.hive-test.acl_administer_queue=*
    yarn.scheduler.capacity.root.hive-test.acl_submit_applications=*
    yarn.scheduler.capacity.root.hive-test.capacity=70
    yarn.scheduler.capacity.root.hive-test.maximum-capacity=70
    yarn.scheduler.capacity.root.hive-test.minimum-user-limit-percent=100
    yarn.scheduler.capacity.root.hive-test.ordering-policy=fifo
    yarn.scheduler.capacity.root.hive-test.production.acl_administer_queue=*
    yarn.scheduler.capacity.root.hive-test.production.acl_submit_applications=*
    yarn.scheduler.capacity.root.hive-test.production.capacity=40
    yarn.scheduler.capacity.root.hive-test.production.maximum-capacity=100
    yarn.scheduler.capacity.root.hive-test.production.minimum-user-limit-percent=100
    yarn.scheduler.capacity.root.hive-test.production.ordering-policy=fifo
    yarn.scheduler.capacity.root.hive-test.production.state=RUNNING
    yarn.scheduler.capacity.root.hive-test.production.user-limit-factor=100
    yarn.scheduler.capacity.root.hive-test.queues=production,user
    yarn.scheduler.capacity.root.hive-test.state=RUNNING
    yarn.scheduler.capacity.root.hive-test.user-limit-factor=10
    yarn.scheduler.capacity.root.hive-test.user.acl_administer_queue=*
    yarn.scheduler.capacity.root.hive-test.user.acl_submit_applications=*
    yarn.scheduler.capacity.root.hive-test.user.capacity=60
    yarn.scheduler.capacity.root.hive-test.user.maximum-capacity=100
    yarn.scheduler.capacity.root.hive-test.user.minimum-user-limit-percent=100
    yarn.scheduler.capacity.root.hive-test.user.ordering-policy=fifo
    yarn.scheduler.capacity.root.hive-test.user.queues=user1,user2,user3
    yarn.scheduler.capacity.root.hive-test.user.state=RUNNING
    yarn.scheduler.capacity.root.hive-test.user.user-limit-factor=100
    yarn.scheduler.capacity.root.hive-test.user.user1.acl_administer_queue=*
    yarn.scheduler.capacity.root.hive-test.user.user1.acl_submit_applications=*
    yarn.scheduler.capacity.root.hive-test.user.user1.capacity=33
    yarn.scheduler.capacity.root.hive-test.user.user1.maximum-capacity=100
    yarn.scheduler.capacity.root.hive-test.user.user1.minimum-user-limit-percent=100
    yarn.scheduler.capacity.root.hive-test.user.user1.ordering-policy=fifo
    yarn.scheduler.capacity.root.hive-test.user.user1.state=RUNNING
    yarn.scheduler.capacity.root.hive-test.user.user1.user-limit-factor=10
    yarn.scheduler.capacity.root.hive-test.user.user2.acl_administer_queue=*
    yarn.scheduler.capacity.root.hive-test.user.user2.acl_submit_applications=*
    yarn.scheduler.capacity.root.hive-test.user.user2.capacity=33
    yarn.scheduler.capacity.root.hive-test.user.user2.maximum-capacity=100
    yarn.scheduler.capacity.root.hive-test.user.user2.minimum-user-limit-percent=100
    yarn.scheduler.capacity.root.hive-test.user.user2.ordering-policy=fifo
    yarn.scheduler.capacity.root.hive-test.user.user2.state=RUNNING
    yarn.scheduler.capacity.root.hive-test.user.user2.user-limit-factor=10
    yarn.scheduler.capacity.root.hive-test.user.user3.acl_administer_queue=*
    yarn.scheduler.capacity.root.hive-test.user.user3.acl_submit_applications=*
    yarn.scheduler.capacity.root.hive-test.user.user3.capacity=34
    yarn.scheduler.capacity.root.hive-test.user.user3.maximum-capacity=100
    yarn.scheduler.capacity.root.hive-test.user.user3.minimum-user-limit-percent=100
    yarn.scheduler.capacity.root.hive-test.user.user3.ordering-policy=fifo
    yarn.scheduler.capacity.root.hive-test.user.user3.state=RUNNING
    yarn.scheduler.capacity.root.hive-test.user.user3.user-limit-factor=10
    yarn.scheduler.capacity.root.queues=default,hive-test
    ​
  3. Hive/Tez - Define default queues and initialize 1 session per queue:
    hive.server2.tez.default.queues=user1,user2,user3
    set hive.server2.tez.sessions.per.default.queue=1
    set hive.server2.tez.initialize.default.sessions=true
    set hive.prewarm.enabled=true
    set hive.prewarm.numcontainers=2
    set tez.am.container.reuse.enabled=true​

Finally, it can also be useful to increase release timeout for idle containers on Tez settings, those will make all the containers to keep running for a short period after query finishes (not only the prewarm containers but also extra containers launched by AM), making subsequent queries initialization time faster. A suggestion would be:

tez.am.container.idle.release-timeout-min.millis=30000

tez.am.container.idle.release-timeout-max.millis=90000

 

PS: If you have multiple business units, each of them with different resource sharing/priorities, you need dedicated hiveserver2 for each business unit. For example: one hiveserver2 for 'marketing', one hiveserver2 for 'fraud', one hiveserver2 for 'financial'. As described above, hiveserver2 will only use initialized sessions if you don't specify tez.queue.name.

 

 

41,399 Views
Comments

Great article

"PS: only jdbc/odbc clients can use initialized Tez AM, hive cli (command line) or other external hive components don't use initialized Tez AM."

Beeline is also able to use initialized containers. Hive cli can't as it is bypassing hs2.

Thanks for sharing this info. Nice Post. I could see above videos are not available. Would request to share a correct URL

@Guilherme Braccialli Would request to share a correct URL for above demos.

avatar
Cloudera Employee

Thank you for sharing info,

I could not see videos, it would be great if you can share URLs for videos.

I'm on a HDP version 3.01 and cannot find a lot of the parameters you described. i', using the ambari-interface.

 

still video are not visible

avatar
Community Manager

We have updated this article to remove links to videos that are no longer available.