Created 08-03-2017 07:11 AM
Hi,
I'm trying to setup web services that interact with my hadoop/hbase kerberized cluster.
My application is deployed in a tomcat server and I would like to avoid recreating a new HBase connection each and every time I have to access HBase.
Similarly, I want my application to be self sufficient, i.e I dont want to proceed with 'kinit' commands before starting up my tomcat server. Thus, I would like to implement a utility class in charge of managing login operation on the cluster and connection to hbase, but I'm struggling with kind of "ticket expiration" issues.
First time my GetHbaseConnection() method is invoked, it properly connects to the cluster using provided keytab & principal (using UserGroupInformation.loginUserFromKeytab(user, keyTabPath) method), and create a brand new hbase connection (ConnectionFactory.createConnection(conf)) => perfect.
By default, obtained ticket has a 10h lifetime (default value from /etc/krb5.conf file), so everything seems to work fine during first 10 hours period.
Unfortunately, after this ticket has expired, my code fails with following exception :
17/08/01 07:40:52 http-nio-8443-exec-4 WARN AbstractRpcClient:699 - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 17/08/01 07:40:52 http-nio-8443-exec-4 ERROR AbstractRpcClient:709 - SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'. javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
=> I had to setup a dedicated thread that invoke UserGroupInformation.checkTGTAndReloginFromKeytab() method on a regular basis in order to refresh the ticket. Anyway, after a long time of inactivity (typically a whole night), when I try to invoke my web service, I can see following warnings in my tomcat logs :
17/08/03 08:25:28 hconnection-0x51b0ea6-shared--pool1-t51 WARN UserGroupInformation:1113 - Not attempting to re-login since the last re-login was attempted less than 600 seconds before. 17/08/03 08:25:29 hconnection-0x51b0ea6-shared--pool1-t51 WARN UserGroupInformation:1113 - Not attempting to re-login since the last re-login was attempted less than 600 seconds before. 17/08/03 08:25:30 hconnection-0x51b0ea6-shared--pool1-t51 WARN UserGroupInformation:1113 - Not attempting to re-login since the last re-login was attempted less than 600 seconds before. 17/08/03 08:25:31 hconnection-0x51b0ea6-shared--pool1-t51 WARN UserGroupInformation:1113 - Not attempting to re-login since the last re-login was attempted less than 600 seconds before. 17/08/03 08:25:35 hconnection-0x51b0ea6-shared--pool1-t51 WARN AbstractRpcClient:695 - Couldn't setup connection for myuser@mydomain.com to hbase/myserver.mydomain.com@mydomain.com
...And my call to the web service finally fails with SocketTimeoutException...
To reproduce the issue quickly, I wrote a simple java application (outside of tomcat), removed the code that logs the user in the cluster to delegate this part to an external/manual kinit operation :
Proceed with a 'kinit' operation outside of my java application. This way I am able to get a "short-life" (1 minute) ticket using a custom krb5.conf file :
env KRB5_CONFIG=/local/home/myuser/mykrb5.conf kinit -kt /local/home/myuser/myuser.keytab myuser@mydomain.com
Then I execute my java standalone application that displays the name of one table in HBase on a regular basis (every 10 seconds). Note that I create a new HBase connection for every iteration, I dont try to reuse connection at the moment :
public static void main(String[] args) throws IOException, InterruptedException { System.setProperty( "sun.security.krb5.debug", "true"); Configuration configuration = HBaseConfiguration.create(); while (true) { Connection conn = ConnectionFactory.createConnection(configuration); Admin admin = conn.getAdmin(); TableName[] tableNames = admin.listTableNames(); System.out.println(tableNames[0].getNameWithNamespaceInclAsString()); Thread.currentThread().sleep(10000); } }
During 1 minute, it works perfectly, but then I face endless warnings and my code does not execute properly :
17/08/01 16:01:55 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before. 17/08/01 16:01:57 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before. 17/08/01 16:01:59 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before. 17/08/01 16:02:00 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before. 17/08/01 16:02:01 WARN ipc.AbstractRpcClient: Couldn't setup connection for myuser@mydomain.com to hbase/myserver.mydomain.com@mydomain.com ...
I dont understand the how kerberos ticket expiration and hbase connection work together, does anyone could help on this topic ?
In other words, I would like that my application connects to the cluster when it starts up, and create an hbase connection that I can keep "forever".
Is it possible ? What did I miss ?
Thanks for your help
Created 08-03-2017 02:54 PM
First off, yes. This can/should work and you have the correct general idea. You deploy a keytab for your application to use. The application logs in when it first starts, and launches a thread to periodically invoke a renewal.
Remember that ticket caches (what happens when you invoke a kinit) is mutually exclusive from a programmatic login using a keytab with UGI.
The warning from UGI can be ignored in your application. UGI won't perform a re-login for calls you make until you reach 80% of your ticket lifetime. I would be more interested in what kind of logging you get out of the UGI class after the 80% of ticket lifetime is exceeded. You should see a message saying that UGI attempted the re-login and (successfully, hopefully) renewed your ticket.
Please remember that the ticket lifetime is unique from a ticket's renewable lifetime. As a sanity check, I would perform a quick experiment to make sure that you have a renewal ticket in the first place.
$ kinit -kt /my/file.keytab principal $ kinit -R
The above should not throw an error (as long as you do it in the renewable lifetime of the ticket). You can also use the command `getprinc` in kadmin to inspect the ticket lifetime and renewal lifetime (e.g. `getprinc principal` in kadmin).
(shameless self-plug) you may also find this presentation that I gave recently trying to de-mystify some of this http://www.slideshare.net/je2451/practical-kerberos-with-apache-hbase/
Created 08-07-2017 06:19 AM
Hi Josh,
Thanks for your help. Unfortunately, I'm still stuck with this issue, which seems related to hbase only, not a pure kerberos/hadoop problem if I understand properly :
I gave a try to a "non-hbase" web service that simply displays the content of an HDFS folder, with the exact same idea (log on the cluster at application startup + background thread that periodically renew), and it works like a charm : I invoke the WS that properly displays the files in the HDFS folder, then I can wait for several days without any other activity on the web application and call it again successfully. Perfect.
Then, back to my hbase example : my web service logs in at startup, creates an HBase connection and displays the name of one table. But if I wait more than the ticket lifetime, when I invoke again the web service, I face the previously mentionned warnings.
According to your answer, I guess I can ignore the first ones, but the latest one is probably the reason why my web service ends with socket timeout error :
17/08/0116:02:01 WARN ipc.AbstractRpcClient:Couldn't setup connection for myuser@mydomain.com to hbase/myserver.mydomain.com@mydomain.com ...
As you were wondering what would occur next, I waited for a couple of minutes (>10), and got the same warning sequence again and again during this period, leading to a socket timeout error on client side (which is not acceptable...).
Finally, I took a look at your last suggestion, but when I try to proceed with 'kinit -R', I face following :
kinit: KDC can't fulfill requested option while renewing credentials
And my ticket expiration time is not updated by this command...Could it be the root cause of my problem ?
Thanks again
Created 08-09-2017 07:35 AM
End of the story : In fact, the problem was related to https://issues.apache.org/jira/browse/HADOOP-10786
I moved to hadoop-common 2.6.1 and used AuthUtil class: http://hbase.apache.org/1.2/devapidocs/org/apache/hadoop/hbase/AuthUtil.html
And everything started to work fine 🙂
Thanks for your help