About raphael_vannson

raphael_vannson · ‎04-28-2016

These are HDPs' STS and HS2 load balancing current capabilities for a kerberized cluster I am aware of. For a non-kerberized cluster: haproxy, httpd +mod_jk or any other soft / hard load balancer will probably do the work.

raphael_vannson · ‎04-28-2016

Hello Kavita, I have not found any doc to put a load balancer in front of STS when the cluster is kerberized (hence the post here 🙂 ). HiveServer2 Load balancing in front of HiveServer2 in a kerberized environment can be achieved by invoking the zookeeper -- see doc here for how it works: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_hadoop-ha/content/ha-hs2-service-discovery.html. This worked out of the box on HDP-2.3 (all the configuration necessary was already set in hive-site), the props are hive.server2.support.dynamic.service.discovery=true hive.server2.zookeeper.namespace=sparkhiveserver2 hive.zookeeper.quorum=zk_host1:port1,zk_host2:port2,zk_host3:port3... Spark Thrift Server I have replicated a similar configuration in my /etc/spark/conf/hive-site.xml but it did not work. It appears this functionality is currently being added to the Apache-Spark (so we will have to wait a bit longer for it be included in the HWX distro). See: this JIRA reporting that STS is not registering with Zookeeper like HS2 does: https://issues.apache.org/jira/browse/SPARK-11100 this github pull request -- it seems a fix has been written and could be merged into the Master branch in the coming weeks: https://github.com/apache/spark/pull/9113 So for now... no load balancing for STS if the cluster is kerberized, otherwise haproxy, httpd +mod_jk or any other load balancer will probably do the work. Cheers!

raphael_vannson · ‎04-27-2016

Is it possible to implement Load Balancing in front of multiple spark thrift servers (sts) if the cluster is kerberized? Ie: How to get around the fact that the host-specific principal has to be mentioned in the connection string. See attempts below (a kinit was done before -- the linux user has a valid TGT): #1 Direct connection to STS (no load balancer): Both of these connection strings work since the keytab for hive/sts_host1_fqdn@REALM is present on sts_host1. $ beeline -u "jdbc:hive2://sts_host1:10001/default;principal=hive/sts_host1_fqdn@REALM" or $ beeline -u "jdbc:hive2://sts_host1:10001/default;principal=hive/_HOST@REALM" (_HOST will resolve to the sts_host1's fqdn). #2 Connection via Load Balancer to one of the STSs: This will only work if load balancer forwards the request to sts_host1 (since the only sts_host1 has the keytab for hive/sts_host1_fqdn@REALM). $ beeline -u "jdbc:hive2://sts_loadbalancer_host:10001/default;principal=hive/sts_host1_fqdn@REALM" ... Error: Could not open client transport with JDBC Uri: jdbc:hive2://sts_loadbalancer_host:10001/default;principal=hive/sts_host1_fqdn@REALM: Peer indicated failure: GSS initiate failed (state=08S01,code=0) This seemed like a good solution but does not work at all, regardless of the sts the request is forwarded to. (It seems _HOST is resolved to the load balancer fqdn -- no keytab for this. We have also tried creating a principal lb/lb_fqdn@REALM and setting the keytab on the servers in /etc/security/keytabs and using this principal in the connection string but this did not solve the issue). $ beeline -u "jdbc:hive2://sts_loadbalancer_host:10001/default;principal=hive/_HOST@REALM" ... 16/04/26 15:37:33 [main]: ERROR transport.TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER)] Finally, we have tried to specify Spark's principal in the connection string since it is not host-dependent, but this principal is refused as is does not 'contain 3 parts' separated by either '/' or '@' (ala: name/host_fqdn@REALM). $ beeline -u "jdbc:hive2://sts_loadbalancer_host:10001/default;principal=spark-cluster_id@REALM" ... Kerberos principal should have 3 parts: spark-cluster_id@REALM Thanks for posting a reply if you have mastered the kerberized-loadbalanced-spark-thrift-server dragon in the past! Then there will be the question of session stickiness for beeline / JDBC connections sending more than one request but one problem at time... 🙂

Online	Offline
Last Visited	‎08-31-2016 03:11 AM

Member Since	‎04-27-2016 09:05 PM
Last Visited	‎08-31-2016 03:11 AM
Posts	5
Kudos received	1

Cloudera Community

Re: How to do load balancing spark thrift servers ...

Re: How to do load balancing spark thrift servers ...

Re: How to do load balancing spark thrift servers ...

Re: How to do load balancing spark thrift servers ...