Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Error Connecting to Impala via HA Proxy Node

avatar
Rising Star

I am trying to connect to Impala through from the edge node of a cluster via HA Proxy. I've verified HAProxy is up and runninng by using it to connect to other services (Hue, for example), but when I enter the below command I receive the following error:

 

-sh-4.2$ impala-shell -i haproxy1:21000 -k --ssl

Starting Impala Shell using Kerberos authentication
Using service name 'impala'
SSL is enabled. Impala server certificates will NOT be verified (set --ca_cert to change)
Error connecting: TTransportException, Could not start SASL: Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Server not found in Kerberos database)

 

The Impala settings in HA Proxy are shown below. Based on what is outlined at https://www.cloudera.com/documentation/enterprise/5-2-x/topics/impala_proxy.html it seems I've covered all of the standard steps. Is there anything else that needs to be configured for HA Proxy to work as a load balancer for Impala?

 

# IMPALA
listen impala :21000
# bind *:21000
mode tcp
option tcplog
balance leastconn

server worker1 worker1.name:21000
server worker2 worker2.name:21000
server worker3 worker3.name:21000
server worker4 worker4.name:21000


listen impalajdbc :21050
# bind *:21050
mode tcp
option tcplog
balance source

server worker1 worker1.name:21000
server worker2 worker2.name:21000
server worker3 worker3.name:21000
server worker4 worker4.name:21000

12 REPLIES 12

avatar
Champion
I haven't set HA Proxy up for Impala, but I think you need a service principal for impala/<HAProxyHost>@REALM.COM in your KDC. The error is that the server is not found in the Kerberos database.

avatar
Rising Star

It does seem like the issue is due to there not being a kerberos ticket for the proxy server. However, I thought that setting "Impala Daemons Load Balancer" to "haproxy_node_name:21000" would take care of this. Per the doc: 

 

Impala Daemons Load Balancer: Address of the load balancer used for Impala daemons. Should be specified in host:port format. If this is specified and Kerberos is enabled, Cloudera Manager adds a principal for 'impala/<load_balancer_host>@<realm>' to the keytab for all Impala daemons.

avatar
Champion
After making this change did you Generate Missing Credential in the CM Security windows or manually create the account and SPN.

I haven't done Impala but for HS2, after adding the LB info in the Hive configs it through a configuration warning that credentials were missing. I generated them, the warning disappeared, and the LB worked.

avatar
Rising Star
I generated the missing credentials in CM and restarted the cluster's services, which led me to the above error.

I believe the issue is tied to the fact that the haproxy node was added on to the cluster and isn't manged by CDH.

avatar
Champion
That shouldn't matter. I am using an ELB that is completely separate from the CDH cluster. Did you specify the FQDN in that setting and does the principal contain the FQDN?

avatar
Rising Star

Yes - the FQDN is of the format:

 

haproxy.company.local

 

As well, the principal looks like:

 

impala/haproxy.company.local@COMPANY.LOCAL

 

Which mirrors the other principals (for example: impala/master-123.company.local@COMPANY.LOCAL)

 

avatar
Champion
impala-shell -i haproxy1:21000 -k --ssl

Are you using the FQDN in the impala-shell command?

i.e. impala-shell -i haproxy.company.local -k -ssl

Is there an ssl certificate for the HAProxy and is it configured to use it. Is the CA cert for it in the PEM file that Impala is configured to use?

avatar
Rising Star

Using the FQDN in the impala-shell statement results in the same error. SSL was configured following: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Load_Balancer_Administ...

 

I've verified the changes outlined there were made to haproxy-https.xml and the SEL linux settings are correct. As well, a self-signed cert was used to contruct the pem file in /etc/ssl/private.

 

As for the last part - how do I ensure that Impala is configured to use a particular PEM file? Is there a relevant config setting?

avatar
Rising Star

Following the Cloudera Doc @ https://www.cloudera.com/documentation/enterprise/5-11-x/topics/impala_proxy.html one potential issue I see is:

 

  1. Choose the host you will use for the proxy server. Based on the Kerberos setup procedure, it should already have an entry impala/proxy_host@realm in its keytab. If not, go back over the initial Kerberos configuration steps for the keytab on each host running the impalad daemon.


After modifying The Impala Daemons Load Balancer field, the keytab files of all the workers running Impala have the haproxy principal present. The calling klist on a worker's keytab file...

 

1 08/01/2017 15:25:11 impala/worker1.company.local@COMPANY.LOCAL
1 08/01/2017 15:25:11 impala/worker2.company.local@COMPANY.LOCAL
1 08/01/2017 15:25:11 impala/worker3.company.local@COMPANY.LOCAL
1 08/01/2017 15:25:11 impala/haproxy1.company.local@COMPANY.LOCAL
1 08/01/2017 15:25:11 impala/haproxy1.company.local@COMPANY.LOCAL
1 08/01/2017 15:25:11 impala/haproxy1.company.local@COMPANY.LOCAL

 

It looks like impala principal for haproxy is correctly present. However, I don't believe there is a keytab present on the haproxy node itself. Does there need to be?