02-17-2017 05:35 AM
I am configuring Impala to use a load balancer and Kerberos. I have this setup working, however I am unable to query each daemon directly. Is this normal behavior?
Showing a successful and unsuccessful query:
[centos@kf0 ~]$ klist Ticket cache: FILE:/tmp/krb5cc_1000 Default principal: alex@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK Valid starting Expires Service principal 02/17/17 13:08:30 02/18/17 13:08:30 krbtgt/CDH-POC-CLUSTER.INTERNAL.CDHNETWORK@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK renew until 02/24/17 13:08:30 02/17/17 13:08:51 02/18/17 13:08:30 impala/dn1.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK renew until 02/24/17 13:08:30 02/17/17 13:14:00 02/18/17 13:08:30 impala/dn2.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK renew until 02/24/17 13:08:30 02/17/17 13:27:16 02/18/17 13:08:30 impala/lb.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK renew until 02/24/17 13:08:30 [centos@kf0 ~]$ impala-shell --ssl --impalad=lb.cdh-poc-cluster.internal.cdhnetwork:21000 -q "show tables" --ca_cert "/etc/ipa/ca.crt" -k -V Starting Impala Shell using Kerberos authentication Using service name 'impala' SSL is enabled Connected to lb.cdh-poc-cluster.internal.cdhnetwork:21000 Server version: impalad version 2.7.0-cdh5.10.0 RELEASE (build 785a073cd07e2540d521ecebb8b38161ccbd2aa2) Query: show tables Fetched 0 row(s) in 0.43s [centos@kf0 ~]$ impala-shell --ssl --impalad=dn1.cdh-poc-cluster.internal.cdhnetwork:21000 -q "show tables" --ca_cert "/etc/ipa/ca.crt" -k -V Starting Impala Shell using Kerberos authentication Using service name 'impala' SSL is enabled Error connecting: TTransportException, TSocket read 0 bytes Not connected to Impala, could not execute queries.
In the logs I see:
E0217 13:27:36.607559 6262 authentication.cc:160] SASL message (Kerberos (external)): GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Request ticket server impala/dn1.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK found in keytab but does not match server principal impala/lb.cdh-poc-cluster.internal.cdhnetwork@) I0217 13:27:36.625763 6262 thrift-util.cc:111] SSL_shutdown: error code: 0 I0217 13:27:36.625901 6262 thrift-util.cc:111] TThreadPoolServer: TServerTransport died on accept: SASL(-13): authentication failure: GSSAPI Failure: gss_accept_sec_context
However in the keytab file I see the dn1 princ is there:
[root@dn1 impalad]# klist -kt /run/cloudera-scm-agent/process/64-impala-IMPALAD/impala.keytab Keytab name: FILE:/run/cloudera-scm-agent/process/64-impala-IMPALAD/impala.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 1 02/17/2017 12:03:52 impala/lb.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK 1 02/17/2017 12:03:52 impala/lb.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK 1 02/17/2017 12:03:52 impala/dn1.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK 1 02/17/2017 12:03:52 impala/dn1.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK [root@dn1 impalad]#
And the daemon princs are set correctly:
[root@dn1 impalad]# cat /run/cloudera-scm-agent/process/64-impala-IMPALAD/impala-conf/impalad_flags | grep princ -principal=impala/lb.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK -be_principal=impala/dn1.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK [root@dn1 impalad]#
So is this normal behaviour that the daemons can no longer be queried directly once Kerberos has been enabled when using a load balancer, or am I doing something wrong?
02-17-2017 08:06 AM
it looks normal to me.. because Impala daemon will be available in all the nodes (in general), but server will be in one node (may be additional nodes if you have HA)... so no need to connect to every individual nodes in the Distributed system
02-17-2017 07:05 PM
02-17-2017 08:11 PM - edited 02-17-2017 08:21 PM
I believe you have Configured a seperate host that act as a proxy ,making it to handle the request along with kerberos . Hence I think you wont be able to by pass the proxy because it works like a session facade
02-19-2017 03:13 AM
This is normal. Once you setup loadbalancer infront of impalad, the impalad will expose itself through the service principal name(SPN) of the loadbalancer to the external client.
If you check the varz page of individual impalad, you can notice following parameters
principal ==> LB SPN
be_principal ==> IMPALAD SPN
This shows that impalad expects LB's SPN for clients communication whereas for internal communication[within impalad's] it uses its own SPN. be_principal --> Backend principal for internal communication.
hence it is required to contact the impalad with LB's SPN.
02-20-2017 07:32 AM
@venkatsambath Thanks for the confirmation. I'm just thinking in terms of high-availability, have we now introduced a single point of failure for Impala? How would you make the load balancer highly available?
You could have multiple load balancers and use DNS CNAMEs to balance these, however since we are using Kerberos the domain name we point requests at must be an A record so this won't work.
Is any of my thinking around the above incorrect? How would be make the service highly available when using load balancers and kerberos for impala?
03-05-2017 03:15 AM - edited 03-05-2017 03:18 AM
First of all. I am sorry for getting back late on this question. One of the key factor of kerberos authentication is its reliability on DNS reverse resolution. Quote from MIT KDC - https://web.mit.edu/kerberos/krb5-1.4/krb5-1.4.4/doc/krb5-admin/Getting-DNS-Information-Correct.html ----  Getting DNS Information Correct Several aspects of Kerberos rely on name service. In order for Kerberos to provide its high level of security, it is less forgiving of name service problems than some other parts of your network. It is important that your Domain Name System (DNS) entries and your hosts have the correct information. ---- Lets say the virtual ip as haproxy.com And loadbalancers are running on below nodes haproxy1.com - 10.0.0.1 haproxy2.com - 10.0.0.2 haproxy3.com - 10.0.0.3 impalad running on nodes impalad1 - 10.0.0.4 impalad2 - 10.0.0.5 impalad3 - 10.0.0.6 ==== # Forward resolution configs for DNS haproxy1.com IN A 10.0.0.1 haproxy.com IN CNAME haproxy1.com ==== Now haproxy.com resolves to ip 10.0.0.1 reverse resolution of ip 10.0.0.1 will result in answer haproxy1.com. This breaks the expectation of kerberos  authentication, so service ticket request will fail when you run impala-shell -i haproxy.com  So our aim is to achieve DNS resolution like this. haproxy.com -> 10.0.0.1 10.0.0.1 -> haproxy.com We can now alter the reverse resolution of DNS to achieve this Reverse zone configuration: ==== inverse-query-haproxy1.com IN PTR haproxy1.com 10.0.0.1 IN CNAME inverse-query-haproxy1.com ==== With these above set of configs we can achieve forward and reverse resolution as expected in  Caution Note: If you run CM agents on one of the proxy machines, i.e. its a part of the cluster, its identity will have to change permanently to the VIP name, because reverse DNS will now never show the original hostname, which could cause other services to have issues unless listening_hostname is configured to use the VIP name. Ideally the haproxy machine should not be added as part of the cluster in CM hosts control, to avoid this from happening - it should be a standalone box.
03-05-2017 03:20 AM
Command to test reverse resolution is described in this link
09-28-2017 11:37 AM
As an alternative, you could enable LDAP for Impala and then connect to the slaves directly thus bypassing Kerberos and the load balancer.