Created on 07-30-2018 04:27 PM - edited 08-17-2019 06:43 AM
HiveServer2 is a thrift server which is a thin Service layer to interact with the HDP cluster in a seamless fashion.
It supports both JDBC and ODBC driver to provide a SQL layer to query the data.
An incoming SQL query is converted to either TEZ or MR job, the results are fetched and send back to client. No heavy lifting work is done inside the HS2. It just acts as a place to have the TEZ/MR driver, scan metadata infor and apply ranger policy for authorization.
A holding period of a few seconds is preferable when multiple sessions are sharing a queue. However, a short holding period negatively impacts query latency.
Heap configurations :
GC tuning :
Database scan disabling And Session initialization parameter :
Tuning OS parameters (Node on which HS2, metastore and zookeeper are running) :
Disconnecting Idle connections to lower down the memory footprint (Values can be set to minutes and seconds):
Proactively closing connections
Connection pool (change it depending how concurrent the connections are happening )
Hive.server2.thrift.max.worker.threads 1000. (If they get exhauseted no incoming request will be served)
Things to watch out for :
1. making more than 60 connection to HS2 from a single machine will result in failures as zookeeper will rate limit it.
4. Watch out for the number on connection limit on the your backed RDBMS.
5. Depending on your usage you need to fine tune heap and GC, so keep an eye on the full GC and minor GC frequency.
6. Usage of add jar leads to class loader memory leak in some version of HS2, please keep an eye.
7. Do remember in Hadoop for any service the client always retries and hence look for retries log in HS2 and optimize the service, to handle connections seamlessly.
8. HS2 has no upper threshold in terms of the number of connection it can accept, it is limited by the Heap and respone of the other service to scale.
9. Keep an eye on CPU consumption on the machine where HS2 is hosted to make sure the process has enough CPU to work with high concurrency.