Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar

80513-hortonworks-hs2.png

80514-hs2-connection-time.png

Key Take away :

1. For a hiveServer2 client the connection time seen is the total time to interact with AD(TGT + Zookeper Service ticket + HiveServer2 Service Ticket) + Zookeeper + HiveServer2 (mysql + YARN allocation).

2. In case your AD is slow, the hive connection will take long time.

3. time beeline -u "zookeeper String" -e "select 1" can be used to find how much time the beeline is taking.

4. In general it takes 4 to 10 seconds for connection to establish.

5. Neither AD, zookeeper or HiveServer2 can ever deny a connection, the connection time can be more but it can never be denied ideally.
6. Clients can only have a timeout (configurable parameter in most of clients like HUE, SAS, Alation, health check scripts ), as neither zookeeper of HS2 can ever deny a connection.

7. HiveServer2 will try to allocate resources in YARN before acking that it has accepted connection. In case your queue is full The connection time will be impacted.

8. Kindly set hive.server2.tez.initialize.default.sessions=true on HS2 in case you want a connection to be accepted even without allocation YARN resource (As yarn resources are already allocated).
9. If you mention queue name in your JDBC string the connection will be accepted only after allocating resources in YARN:



Reasons under which the connection time is more.

1. AD is slow
2. Zookeeper is having too many connection issue or zookeeper is slow
3. HiveServer2 interaction with Mysql is slow

4. Huge GC is happening within HiveServer2 or Zookeeper.

5. HS2 can deny a connection if it has exhausted all its handler-thread.
6. Zookeeper can deny a connection if has reached to its max rate limit from a host.

https://community.hortonworks.com/articles/51191/understanding-apache-zookeeper-connection-rate-lim....
7. Mysql slowness can directly impact the HIveServer2.

8. Mysql is reaching max_connection limit

9. Network is slow.

10. HiveServer2 does a lot of retries for every service it talks to(atlas, solr, kafka, msql, datanode, namenode, RM) keep an eye of any retries thats happening.

The various ways to find the time for individual steps are

1. Run Beeline in debug mode.

https://community.hortonworks.com/content/supportkb/150574/how-to-enable-debug-logging-for-beeline.h...

2. strace -t beeline -u "Zookeeper JDBC string" -e "select 1"


2jzss.png
4,724 Views