Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Hiveserver2 HA using haproxy load balancing

SOLVED Go to solution

Re: Hiveserver2 HA using haproxy load balancing

Explorer

@bgooley

 

As suggested i have checked for SAN recreated certificate with SAN name defined in it with hostname of hapoxy server.

 

Checked and verified, haproxy forwarded the requests to both hiveserver2 instace, verified through logs.

 

Few questions:

 

1. What are different type of balance algorithm in haproxy like mentioned below ?

balance source

 

2. What the difference between source, leastconn, roundrobin, etc?

 

- Vijay M

Re: Hiveserver2 HA using haproxy load balancing

Explorer

@bgooley

 

After setting up Hiveserver2 HA and Impala using haproxy does

 

1. Any configuration needs to be done in Hue

2. While Connecting to Hive and impala through Hue any additional configuration in haproxy require?

 

- Vijay M

Re: Hiveserver2 HA using haproxy load balancing

Super Guru

@VijayM,

 

The way that Hue is designed, it needs to know that an impala connection it has open (where it excecuted a query on a coordinator) will connect to the same coordinator.  This is because Hue needs to pull information regarding the query for display.  This means that the balancer in between Hue and Impala needs to use IP persistence.  Also, to avoid intermittent session errors with impala, it is recommended that the timeout at the HAProxy side be increased to a long time so that connections are not timed out.

 

No configuration in Hue is required.  Just make sure that Hue knows to connect to the right server/port (Impala Load Balancer (HAProxy)) in its config.  Here is an example of a configuration that has 3 ports:

  • one for impala-shell
  • one for JDBC-based applications
  • one for Hue

Since Hue has some specific needs that may not be required for other applications, this makes sense.

Here is an example config that does pass-through TLS. 

NOTE:  I don't think the ssl stuff is necessary for pass-through since the packets should be passed to the backend servers without TLS negotiation, so you can probably ignore that.

 

NOTE2:  Currently it is not possible to have true load balancing for Hue connections to Impala, but we are working on it and have some code that could change that.  For now, you can achieve failover for the Hue connections, but not real balancing of connections.

 

# For impala-shell users on port 21000.
#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend impala_front
bind *:21000 ssl crt /opt/cloudera/security/x509/certkeynopw.pem
mode tcp
option tcplog
default_backend impala-shell

#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend impala-shell
balance leastconn
mode tcp
server impalad1 impalad-1.example.com:21000 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
server impalad2 impalad-2.example.com:21000 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
server impalad3 impalad-3.example.com:21000 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem


# For JDBC or ODBC version 2.x driver, use port 21050 instead of 21000.
#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend impala_front
bind *:21050 ssl crt /opt/cloudera/security/x509/certkeynopw.pem
mode tcp
option tcplog
default_backend impala-jdbc

#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend impala-jdbc
balance leastconn
mode tcp
server impalad1 impalad-1.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
server impalad2 impalad-2.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
server impalad3 impalad-3.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem


# Setup for Hue or other JDBC-enabled applications.
# In particular, Hue requires SOURCE IP PERSISTANCE
# The application connects to load_balancer_host:21051, and HAProxy balances
# connections to the associated hosts, where Impala listens for JDBC
# requests on port 21050.
# Notice the timeouts below that do not exist in the other configs
# these are to stop the connections from being killed even though
# hue is using them
#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend impalajdbc_front
bind *:21051 ssl crt /opt/cloudera/security/x509/certkeynopw.pem
mode tcp
option tcplog
timeout client 720m
timeout server 720m
default_backend impala-hue

#---------------------------------------------------------------------
# source balancing between the various backends
#---------------------------------------------------------------------
backend impala-hue
balance source
mode tcp
server impalad1 impalad-1.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
server impalad2 impalad-2.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
server impalad3 impalad-3.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem

Re: Hiveserver2 HA using haproxy load balancing

Super Guru

@VijayM,

 

Oh, and the same rules apply to Hive as well.  Forgot to add that.

Re: Hiveserver2 HA using haproxy load balancing

Explorer

@bgooley 

 

I need your help once again.

 

This is for Hue connecting to hiveservers2 instances using haproxy load balancer.

 

Below is haproxy configuration for hiveserver2 to connect using hue, Kindly revert if any mistake in below configuration.

 

frontend hivejdbc_front
bind *:10003
mode tcp
option tcplog
timeout client 720m
timeout server 720m
default_backend hive-hue

#---------------------------------------------------------------------
# source balancing between the various backends
#---------------------------------------------------------------------
backend hive-hue
balance source
mode tcp
server hs2_1 a301-9941-0809.ldn.swissbank.com:10001 check
server hs2_2 a301-9941-1309.ldn.swissbank.com:10001 check

 

- Update hue config property Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini from CM UI and added below into it, Kindly confirm if its not correct.

 

[beeswax]
hive_server_host=a301-9941-0727.ldn.swissbank.com
hive_server_port=10003

 

 

Also help me with haproxy config for any ODBC connectivity to hive from any BI tools.

 

- Vijay M

Highlighted

Re: Hiveserver2 HA using haproxy load balancing

New Contributor

@VijayM wrote:

Hello Team,

 

We have CDH 5.15 cluster running and have kerberos and TLS enabled for all services in the cluster.

 

We would like to enable for Hiveserver2 using haproxy load balancer.

 

We have enable HA for hivemetastore using below link. 2 instance of hive metastore is up and running.

https://www.cloudera.com/documentation/enterprise/5-15-x/topics/admin_ha_hivemetastore.html

 

Refering below link for hiveserver2 ha.

 

https://www.cloudera.com/documentation/enterprise/5-15-x/topics/admin_ha_hiveserver2.html

 

haproxy, 1 instance of hive metastore, 1 instance of hiveserver2 installed krogerfeedback on same node.

 

beeline throws below error.

 

beeline> !connect jdbc:hive2://abc:10001/default;ssl=true;sslTrustStore=/app/bds/security/pki/cloudera_truststore.jks;sslTrustPassword=xxxxx;principal=hive/aabc@REALM
Connecting to jdbc:hive2://abc:10001/default;ssl=true;sslTrustStore=/app/bds/security/pki/cloudera_truststore.jks;sslTrustPassword=xxxxx;principal=hive/aabc@REALM
Unknown HS2 problem when communicating with Thrift server.
Error: Could not open client transport with JDBC Uri: jdbc:hive2://abc:10001/default;ssl=true;sslTrustStore=/app/bds/security/pki/cloudera_truststore.jks;sslTrustPassword=xxxxxx;principal=hive/aabc@REALM: javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake (state=08S01,code=0)

 

 

Below snap for haproxy config

 

# This is the setup for HS2. beeline client connect to load_balancer_host:10001.
# HAProxy will balance connections among the list of servers listed below.
listen hiveserver2 :10001
mode tcp
option tcplog
balance source
server hiveserver2_1 abc:10000
server hiveserver2_2 xyz:10000

 

 

Kindly suggest?

 

 

- Vijay M


This is getting really complicated for me, please help!