<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to do load balancing spark thrift servers on HWX? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-load-balancing-spark-thrift-servers-on-HWX/m-p/117823#M26255</link>
    <description>&lt;P style="margin-left: 40px;"&gt; &lt;A rel="user" href="https://community.cloudera.com/users/2668/rvgn77.html" nodeid="2668"&gt;@kavitha velaga&lt;/A&gt; You can use a virtual or physical load balancer and use methods ie round robin, ratio, dynamic ration, least connections, etc.   Does that help?&lt;/P&gt;</description>
    <pubDate>Wed, 27 Apr 2016 02:29:41 GMT</pubDate>
    <dc:creator>sunile_manjee</dc:creator>
    <dc:date>2016-04-27T02:29:41Z</dc:date>
    <item>
      <title>How to do load balancing spark thrift servers on HWX?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-load-balancing-spark-thrift-servers-on-HWX/m-p/117822#M26254</link>
      <description>&lt;P&gt;Installed version of HDP is 2.3.4. how to  load balancing spark thrift servers on HWX? &lt;/P&gt;</description>
      <pubDate>Wed, 27 Apr 2016 01:54:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-load-balancing-spark-thrift-servers-on-HWX/m-p/117822#M26254</guid>
      <dc:creator>Neyyu</dc:creator>
      <dc:date>2016-04-27T01:54:14Z</dc:date>
    </item>
    <item>
      <title>Re: How to do load balancing spark thrift servers on HWX?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-load-balancing-spark-thrift-servers-on-HWX/m-p/117823#M26255</link>
      <description>&lt;P style="margin-left: 40px;"&gt; &lt;A rel="user" href="https://community.cloudera.com/users/2668/rvgn77.html" nodeid="2668"&gt;@kavitha velaga&lt;/A&gt; You can use a virtual or physical load balancer and use methods ie round robin, ratio, dynamic ration, least connections, etc.   Does that help?&lt;/P&gt;</description>
      <pubDate>Wed, 27 Apr 2016 02:29:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-load-balancing-spark-thrift-servers-on-HWX/m-p/117823#M26255</guid>
      <dc:creator>sunile_manjee</dc:creator>
      <dc:date>2016-04-27T02:29:41Z</dc:date>
    </item>
    <item>
      <title>Re: How to do load balancing spark thrift servers on HWX?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-load-balancing-spark-thrift-servers-on-HWX/m-p/117824#M26256</link>
      <description>&lt;P&gt;&lt;STRONG&gt;Is it possible to implement Load Balancing in front of multiple spark thrift servers (sts) if the cluster is kerberized?&lt;/STRONG&gt;
Ie: How to get around the fact that the host-specific principal has to be mentioned in the connection string. See attempts below (a kinit was done before -- the linux user has a valid TGT):&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;#1 Direct connection to STS (no load balancer):&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Both of these connection strings work since the keytab for hive/sts_host1_fqdn@REALM is present on sts_host1.&lt;/P&gt;&lt;PRE&gt;$ beeline -u "jdbc:hive2://sts_host1:10001/default;principal=hive/sts_host1_fqdn@REALM" &lt;/PRE&gt;or &lt;PRE&gt;$ beeline -u "jdbc:hive2://sts_host1:10001/default;principal=hive/_HOST@REALM"&lt;/PRE&gt;&lt;P&gt;(&lt;STRONG&gt;_HOST&lt;/STRONG&gt; will resolve to the sts_host1's fqdn).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;
&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;
&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;#2 Connection via Load Balancer to one of the STSs:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;This will only work if load balancer forwards the request to sts_host1 (since the only sts_host1 has the keytab for hive/sts_host1_fqdn@REALM).&lt;/P&gt;&lt;PRE&gt;$ beeline -u "jdbc:hive2://sts_loadbalancer_host:10001/default;principal=hive/sts_host1_fqdn@REALM"
...
Error: Could not open client transport with JDBC Uri: jdbc:hive2://sts_loadbalancer_host:10001/default;principal=hive/sts_host1_fqdn@REALM: Peer indicated failure: GSS initiate failed (state=08S01,code=0)&lt;/PRE&gt;&lt;P&gt;This seemed like a good solution but does not work at all, regardless of the sts the request is forwarded to. (It seems &lt;STRONG&gt;_HOST&lt;/STRONG&gt; is resolved to the load balancer fqdn --  no keytab for this. We have also tried creating a principal &lt;STRONG&gt;lb/lb_fqdn@REALM&lt;/STRONG&gt; and setting the keytab on the servers in /etc/security/keytabs and using this principal in the connection string but this did not solve the issue).&lt;/P&gt;&lt;PRE&gt;$ beeline -u "jdbc:hive2://sts_loadbalancer_host:10001/default;principal=hive/_HOST@REALM"
...
16/04/26 15:37:33 [main]: ERROR transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER)]
&lt;/PRE&gt;Finally, we have tried to specify Spark's principal in the connection string since it is not host-dependent, but this principal is refused as is does not 'contain 3 parts' separated by either '/' or '@' (ala: name/host_fqdn@REALM).&lt;PRE&gt;$ beeline -u "jdbc:hive2://sts_loadbalancer_host:10001/default;principal=spark-cluster_id@REALM" 
...
Kerberos principal should have 3 parts: spark-cluster_id@REALM
&lt;/PRE&gt;&lt;P&gt;Thanks for posting a reply if you have mastered the kerberized-loadbalanced-spark-thrift-server dragon in the past!&lt;/P&gt;&lt;P&gt;
Then there will be the question of session stickiness for beeline / JDBC connections sending more than one request but one problem at time... &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Apr 2016 04:41:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-load-balancing-spark-thrift-servers-on-HWX/m-p/117824#M26256</guid>
      <dc:creator>raphael_vannson</dc:creator>
      <dc:date>2016-04-28T04:41:13Z</dc:date>
    </item>
    <item>
      <title>Re: How to do load balancing spark thrift servers on HWX?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-load-balancing-spark-thrift-servers-on-HWX/m-p/117825#M26257</link>
      <description>&lt;P&gt;Thank you all. Raphael, I didn't find the documentation for this. Can you please send me the link?&lt;/P&gt;</description>
      <pubDate>Thu, 28 Apr 2016 22:53:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-load-balancing-spark-thrift-servers-on-HWX/m-p/117825#M26257</guid>
      <dc:creator>Neyyu</dc:creator>
      <dc:date>2016-04-28T22:53:17Z</dc:date>
    </item>
    <item>
      <title>Re: How to do load balancing spark thrift servers on HWX?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-load-balancing-spark-thrift-servers-on-HWX/m-p/117826#M26258</link>
      <description>&lt;P&gt;
	Hello Kavita,&lt;/P&gt;&lt;P&gt;
	I have not found any doc to put a load balancer in front of STS when the cluster is kerberized (hence the post here &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; ).&lt;/P&gt;&lt;P&gt;
	&lt;STRONG&gt;
	&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
	&lt;STRONG&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;	&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
	&lt;STRONG&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;	&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
	&lt;STRONG&gt;HiveServer2&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
	&lt;STRONG&gt;&lt;/STRONG&gt;Load balancing in front of HiveServer2 in a kerberized environment can be achieved by invoking the zookeeper -- see doc here for how it works: &lt;A href="https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_hadoop-ha/content/ha-hs2-service-discovery.html"&gt;https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_hadoop-ha/content/ha-hs2-service-discovery.html.&lt;/A&gt;&lt;/P&gt;&lt;P&gt;
	&lt;A href="https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_hadoop-ha/content/ha-hs2-service-discovery.html"&gt;&lt;/A&gt;This worked out of the box on HDP-2.3 (all the configuration necessary was already set in hive-site), the props are&lt;/P&gt;
&lt;PRE&gt;hive.server2.support.dynamic.service.discovery=true
hive.server2.zookeeper.namespace=sparkhiveserver2
hive.zookeeper.quorum=zk_host1:port1,zk_host2:port2,zk_host3:port3...
&lt;/PRE&gt;&lt;P&gt;
	&lt;STRONG&gt;Spark Thrift Server&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
	&lt;STRONG&gt;&lt;/STRONG&gt;I have replicated a similar configuration in my /etc/spark/conf/hive-site.xml but it did not work.
It appears this functionality is currently being added to the Apache-Spark (so we will have to wait a bit longer for it be included in the HWX distro). See:&lt;/P&gt;&lt;OL&gt;
	
&lt;LI&gt;this JIRA reporting that STS is not registering with Zookeeper like HS2 does: &lt;A href="https://issues.apache.org/jira/browse/SPARK-11100"&gt;https://issues.apache.org/jira/browse/SPARK-11100&lt;/A&gt;&lt;/LI&gt;	
&lt;LI&gt;this github pull request -- it seems a fix has been written and could be merged into the Master branch in the coming weeks: &lt;A href="https://github.com/apache/spark/pull/9113"&gt;https://github.com/apache/spark/pull/9113&lt;/A&gt; &lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;
	So for now... no load balancing for STS if the cluster is kerberized, otherwise haproxy, httpd +mod_jk or any other load balancer will probably do the work.&lt;/P&gt;&lt;P&gt;
	Cheers!&lt;/P&gt;</description>
      <pubDate>Fri, 29 Apr 2016 01:22:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-load-balancing-spark-thrift-servers-on-HWX/m-p/117826#M26258</guid>
      <dc:creator>raphael_vannson</dc:creator>
      <dc:date>2016-04-29T01:22:22Z</dc:date>
    </item>
    <item>
      <title>Re: How to do load balancing spark thrift servers on HWX?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-load-balancing-spark-thrift-servers-on-HWX/m-p/117827#M26259</link>
      <description>&lt;P style="margin-left: 40px;"&gt; &lt;A rel="user" href="https://community.cloudera.com/users/9911/raphaelvannson.html" nodeid="9911"&gt;@Raphael Vannson&lt;/A&gt; Great analysis.  is this only true for kerberized cluster?&lt;/P&gt;</description>
      <pubDate>Fri, 29 Apr 2016 02:41:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-load-balancing-spark-thrift-servers-on-HWX/m-p/117827#M26259</guid>
      <dc:creator>sunile_manjee</dc:creator>
      <dc:date>2016-04-29T02:41:19Z</dc:date>
    </item>
    <item>
      <title>Re: How to do load balancing spark thrift servers on HWX?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-load-balancing-spark-thrift-servers-on-HWX/m-p/117828#M26260</link>
      <description>&lt;P&gt;These are HDPs' STS and HS2 load balancing current capabilities for a kerberized cluster I am aware of.&lt;/P&gt;&lt;P&gt;For a non-kerberized cluster: haproxy, httpd +mod_jk or any other soft / hard load balancer will probably do the work.&lt;/P&gt;</description>
      <pubDate>Fri, 29 Apr 2016 03:12:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-load-balancing-spark-thrift-servers-on-HWX/m-p/117828#M26260</guid>
      <dc:creator>raphael_vannson</dc:creator>
      <dc:date>2016-04-29T03:12:27Z</dc:date>
    </item>
    <item>
      <title>Re: How to do load balancing spark thrift servers on HWX?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-load-balancing-spark-thrift-servers-on-HWX/m-p/117829#M26261</link>
      <description>&lt;P style="margin-left: 40px;"&gt; &lt;A rel="user" href="https://community.cloudera.com/users/9911/raphaelvannson.html" nodeid="9911"&gt;@Raphael Vannson&lt;/A&gt; you mean aren't correct?&lt;/P&gt;</description>
      <pubDate>Fri, 29 Apr 2016 09:08:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-do-load-balancing-spark-thrift-servers-on-HWX/m-p/117829#M26261</guid>
      <dc:creator>sunile_manjee</dc:creator>
      <dc:date>2016-04-29T09:08:48Z</dc:date>
    </item>
  </channel>
</rss>

