Reply
Explorer
Posts: 9
Registered: ‎01-16-2017
Accepted Solution

Impala: Querying daemon directly when using Kerberos and Load balancer

Hi,

 

I am configuring Impala to use a load balancer and Kerberos. I have this setup working, however I am unable to query each daemon directly. Is this normal behavior?

 

Showing a successful and unsuccessful query:

[centos@kf0 ~]$ klist
Ticket cache: FILE:/tmp/krb5cc_1000
Default principal: alex@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK

Valid starting     Expires            Service principal
02/17/17 13:08:30  02/18/17 13:08:30  krbtgt/CDH-POC-CLUSTER.INTERNAL.CDHNETWORK@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK
	renew until 02/24/17 13:08:30
02/17/17 13:08:51  02/18/17 13:08:30  impala/dn1.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK
	renew until 02/24/17 13:08:30
02/17/17 13:14:00  02/18/17 13:08:30  impala/dn2.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK
	renew until 02/24/17 13:08:30
02/17/17 13:27:16  02/18/17 13:08:30  impala/lb.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK
	renew until 02/24/17 13:08:30
[centos@kf0 ~]$ impala-shell --ssl --impalad=lb.cdh-poc-cluster.internal.cdhnetwork:21000 -q "show tables" --ca_cert "/etc/ipa/ca.crt" -k -V
Starting Impala Shell using Kerberos authentication
Using service name 'impala'
SSL is enabled
Connected to lb.cdh-poc-cluster.internal.cdhnetwork:21000
Server version: impalad version 2.7.0-cdh5.10.0 RELEASE (build 785a073cd07e2540d521ecebb8b38161ccbd2aa2)
Query: show tables

Fetched 0 row(s) in 0.43s
[centos@kf0 ~]$ impala-shell --ssl --impalad=dn1.cdh-poc-cluster.internal.cdhnetwork:21000 -q "show tables" --ca_cert "/etc/ipa/ca.crt" -k -V
Starting Impala Shell using Kerberos authentication
Using service name 'impala'
SSL is enabled
Error connecting: TTransportException, TSocket read 0 bytes
Not connected to Impala, could not execute queries.

In the logs I see:

E0217 13:27:36.607559  6262 authentication.cc:160] SASL message (Kerberos (external)): GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information (Request ticket server impala/dn1.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK found in keytab but does not match server principal impala/lb.cdh-poc-cluster.internal.cdhnetwork@)
I0217 13:27:36.625763  6262 thrift-util.cc:111] SSL_shutdown: error code: 0
I0217 13:27:36.625901  6262 thrift-util.cc:111] TThreadPoolServer: TServerTransport died on accept: SASL(-13): authentication failure: GSSAPI Failure: gss_accept_sec_context

However in the keytab file I see the dn1 princ is there:

[root@dn1 impalad]# klist -kt /run/cloudera-scm-agent/process/64-impala-IMPALAD/impala.keytab
Keytab name: FILE:/run/cloudera-scm-agent/process/64-impala-IMPALAD/impala.keytab
KVNO Timestamp           Principal
---- ------------------- ------------------------------------------------------
   1 02/17/2017 12:03:52 impala/lb.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK
   1 02/17/2017 12:03:52 impala/lb.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK
   1 02/17/2017 12:03:52 impala/dn1.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK
   1 02/17/2017 12:03:52 impala/dn1.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK
[root@dn1 impalad]# 

And the daemon princs are set correctly:

[root@dn1 impalad]# cat /run/cloudera-scm-agent/process/64-impala-IMPALAD/impala-conf/impalad_flags | grep princ
-principal=impala/lb.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK
-be_principal=impala/dn1.cdh-poc-cluster.internal.cdhnetwork@CDH-POC-CLUSTER.INTERNAL.CDHNETWORK
[root@dn1 impalad]# 

 

So is this normal behaviour that the daemons can no longer be queried directly once Kerberos has been enabled when using a load balancer, or am I doing something wrong?

 

Thanks

Highlighted
Posts: 146
Topics: 8
Kudos: 23
Solutions: 11
Registered: ‎09-02-2016

Re: Impala: Querying daemon directly when using Kerberos and Load balancer

@bushnoh

 

it looks normal to me.. because Impala daemon will be available in all the nodes (in general), but server will be in one node (may be additional nodes if you have HA)... so no need to connect to every individual nodes in the Distributed system

Posts: 217
Topics: 0
Kudos: 24
Solutions: 17
Registered: ‎08-16-2016

Re: Impala: Querying daemon directly when using Kerberos and Load balancer

Give it a try with the -k switch. This lets the shell know that it should authenticate with Kerberos and this is off by default. The logs also tell me that the server side was expected Kerberos auth but it failed that. I don't know why it works to the LB. Is the LB doing pass through authentication?

I haven't worked with an LB in front of Impala, so I don't know if this is normal. I would hope that I could still query each one if needed. I am curious as I am looking to add a LB for Impala soon.
Expert Contributor
Posts: 128
Registered: ‎05-16-2016

Re: Impala: Querying daemon directly when using Kerberos and Load balancer

[ Edited ]

I believe you have Configured  a seperate host that act as  a proxy  ,making it to handle the request  along with kerberos . Hence I think you wont be able to by pass the proxy because it works like a session facade

 

https://www.cloudera.com/documentation/enterprise/5-2-x/topics/impala_proxy.html#proxy_kerberos

Cloudera Employee
Posts: 20
Registered: ‎12-11-2015

Re: Impala: Querying daemon directly when using Kerberos and Load balancer

@bushnoh

This is normal. Once you setup loadbalancer infront of impalad, the impalad will expose itself through the service principal name(SPN) of the loadbalancer to the external client.

If you check the varz page of individual impalad, you can notice following parameters

https://<impalad-hostname>:25000/varz

principal ==> LB SPN
be_principal ==> IMPALAD SPN

This shows that impalad expects LB's SPN for clients communication whereas for internal communication[within impalad's] it uses its own SPN. be_principal --> Backend principal for internal communication.

hence it is required to contact the impalad with LB's SPN.

Explorer
Posts: 9
Registered: ‎01-16-2017

Re: Impala: Querying daemon directly when using Kerberos and Load balancer

@venkatsambath Thanks for the confirmation. I'm just thinking in terms of high-availability, have we now introduced a single point of failure for Impala? How would you make the load balancer highly available?

You could have multiple load balancers and use DNS CNAMEs to balance these, however since we are using Kerberos the domain name we point requests at must be an A record so this won't work.

 

Is any of my thinking around the above incorrect? How would be make the service highly available when using load balancers and kerberos for impala?

Announcements