Created on 07-15-201801:20 PM - edited 08-17-201906:58 AM
Key Take away :
1. For a hiveServer2 client the connection time seen is the total time to interact with AD(TGT + Zookeper Service ticket + HiveServer2 Service Ticket) + Zookeeper + HiveServer2 (mysql + YARN allocation).
2. In case your AD is slow, the hive connection will take long time.
3. time beeline -u "zookeeper String" -e "select 1" can be used to find how much time the beeline is taking.
4. In general it takes 4 to 10 seconds for connection to establish.
5. Neither AD, zookeeper or HiveServer2 can ever deny a connection, the connection time can be more but it can never be denied ideally. 6. Clients can only have a timeout (configurable parameter in most of clients like HUE, SAS, Alation, health check scripts ), as neither zookeeper of HS2 can ever deny a connection.
7. HiveServer2 will try to allocate resources in YARN before acking that
it has accepted connection. In case your queue is full The connection
time will be impacted.
8. Kindly set hive.server2.tez.initialize.default.sessions=true on
HS2 in case you want a connection to be accepted even without allocation
YARN resource (As yarn resources are already allocated). 9. If you mention queue name in your JDBC string the connection will be accepted only after allocating resources in YARN:
Reasons under which the connection time is more.
1. AD is slow 2. Zookeeper is having too many connection issue or zookeeper is slow 3. HiveServer2 interaction with Mysql is slow
4. Huge GC is happening within HiveServer2 or Zookeeper.
5. HS2 can deny a connection if it has exhausted all its handler-thread. 6. Zookeeper can deny a connection if has reached to its max rate limit from a host.
10. HiveServer2 does a lot of retries for every service it talks to(atlas, solr, kafka, msql, datanode, namenode, RM) keep an eye of any retries thats happening.
The various ways to find the time for individual steps are