Created 07-05-2017 03:47 PM
Hi,
Using https://knox.apache.org/books/knox-0-9-0/user-guide.html, I have configured Knox topology for Hive Server2 High Availability.
I also noticed Dynamic Service Discovery Through ZooKeeper in documentation.
I see that all the queries/connections happen though only one of the HiveServer2, now if this HS2 instances down, I notice that connections/queries happen through another instance of HS2.
My question is : In the busy cluster, when we have multiple HS2 servers installed, is it possible to load balance (possibly round robin) so that one server does not get overloaded? If yes, how?
Regards,
SS
Created 07-05-2017 04:52 PM
Your Network team can create a DNS Name and can add both the hiverserver2 IPs under this DNS in either round robin or load-balance mode. Then, you can use the DNS name and 10000 port in the Knox configuration to submit hive jobs. It should provide what you are looking for. Please let me know.
Created 07-05-2017 04:52 PM
Your Network team can create a DNS Name and can add both the hiverserver2 IPs under this DNS in either round robin or load-balance mode. Then, you can use the DNS name and 10000 port in the Knox configuration to submit hive jobs. It should provide what you are looking for. Please let me know.
Created 07-06-2017 09:07 AM
Thank you @Manish Gupta,
This is something we can try by having a proxy configured to have load balancing for HS2. Also would like to understand are they changes to be made for Zookeeper ?
Is there any other way by which, without using non HDP component or external network changes, we can achieve load balancing of HS2.
Regards,
Created 07-06-2017 02:11 PM
No changes would be required for ZooKeeper. Unfortunatley, load-balancing of Hive Server2 feature is not available in HDP currently. Hopefully in future release.
As always, if you find this post helpful, don't forget to "accept" answer.
Created 07-07-2017 10:01 AM
Thanks @Manish Gupta for clarification. This is helpful.
Cheers.