We are starting down the path to setup a F5 LTM (Local Traffic Manager) load balancer for Hive and Impala. I have looked at the Impala HAProxy load balancer example on the Cloudera documentation already. That is helpful but didn't know if anybody has been down the path with F5 before. Any insight or leassons learned would be greatly appreciated.
Did you ever get this working, we are looking to do the same thing and was curious what your results were.
No, we went a different route. We decided to use a round robin Network A record for all of the Impala data nodes.
Sample of a round robing Network A record is here: https://en.wikipedia.org/wiki/Round-robin_DNS
So we will use a DNS record like "impala-datanode" that we put into the Impala ODBC server/host setting. Then when a user connects via ODBC it will pick up a different Impala data node as the coordinator to distribute the Impala coordination workload. Then if an Impala data node goes down, then we open a ticket with our network team to remove that server from the DNS round robin Network A record until we can get it back online.
Hi Alex, we have setup Impala, Hue,and Hive using the technote that you have shared. We are having problems in settins up
Oozie, HttpFS, Spark History Server,Yarn History Server and HBASE Web UI through the F5 vips.
We use encryption at rest and transit and do kerberos tickets for auth sessions.
What we are seeing is that the kerberos tickets are not present in the browser session when we go to the vip for each of these URLs, but when we use the direct URLs we see the kerberos spnego tickets in the cookies section of the browser session.
Do we absolutely need an APM license as mentioned in the below document to achieve succesful logins to the above mentioned urls through the F5 ?
The F5 urls fail in the browsers with the same error:
HTTP Status 403 - GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails)
type Status report
message GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails)
description Access to the specified resource has been forbidden.
We have configured the LTM for the impala daemon services running on the port number 21000 and 21050. However, when ever we try to invoke the impala-shell -i <FQDN pointing to LTM VIP> it throws an error as under:
Starting Impala Shell without Kerberos authentication
Error connecting: TTransportException, TSocket read 0 bytes
Kerberos ticket found in the credentials cache, retrying the connection with a secure transport.
Error connecting: TTransportException, Could not start SASL: Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Server not found in Kerberos database).
As per the documentation I need to configure the impala settings only if I am accessing via HUE and impala-shell shall work without the impala settings suggested in the pdf document.
Note that I am able to connect to individual impala daemons with impala-shell command without any issue.
Please suggest what else can be explored.
Appreciate your help.
Hi Ashish, as long as you do not configure the LTM-vip in your impala configuration's "Impala Daemons Load Balancer" section, you will not have a valid kerberos credential (generated and managed by CM automatically) which looks like this > impala/LTMfirstname.lastname@example.org. This is needed for a succesful vip name match to an existing kerberos credential in the CM credential cache (Admiinstration>Security>Kerberos Credentials).
If impala loadbalancer is not configured in "Impala Daemons Load Balancer" section, when you invoke impalashell, it will try to match the vip name with an exiting servers Kerberos credentials and fails.
You will be able to connect to individual daemons because CM autogenerated kerberos credentials for each of you r servers like > impala/servername.domain.com , so that credential cache already has an entry for individual servers.
hope this helps,
It sounds like you're using Kerberos but haven't configured the Impala Daemons Load Balancer parameter. This step is required for using a load balancer with Impala. Once implemented, no additional configuration is required for Hue to take advantage of this paramter... just a restart of Hue.
The Configure Impala section of the Impala HA with F5 BIG-IP guide (PDF) outlines the required steps.