Support Questions

Find answers, ask questions, and share your expertise

Kerberos ticket renewal after expiry period

avatar
Contributor

Hi All,

I am seeing below warning and exception in my solr.log.

***********************************

java.io.IOException: Failed on local exception: java.io.IOException: Couldn't setup connection for solr/solr1.mycluster.com@MYCLUSTER.COM to solr2.mycluster.com/172.31.16.23:8020; Host Details : local host is: "java.net.UnknownHostException: solr1.mycluster.com: solr1.mycluster.com: System error"; destination host is: "solr2.mycluster.com":8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy10.renewLease(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:571) at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy11.renewLease(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:879) at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417) at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442) at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71) at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Couldn't setup connection for solr/solr1.mycluster.com@MYCLUSTER.COM to solr2.mycluster.com/172.31.16.23:8020 at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:672) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:643) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:730) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521) at org.apache.hadoop.ipc.Client.call(Client.java:1438) ... 16 more Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Attempt to obtain new INITIATE credentials failed! (null))] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:553) at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:368) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:722) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:718) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717) ... 19 more Caused by: GSSException: No valid credentials provided (Mechanism level: Attempt to obtain new INITIATE credentials failed! (null)) at sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:343) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:145) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192) ... 28 more Caused by: javax.security.auth.login.LoginException: ad01.mycluster.com at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:808) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at javax.security.auth.login.LoginContext.login(LoginContext.java:587) at sun.security.jgss.GSSUtil.login(GSSUtil.java:258) at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:158) at sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:335) at sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:331) at java.security.AccessController.doPrivileged(Native Method) at sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:330) ... 35 more Caused by: java.net.UnknownHostException: ad01.mycluster.com at java.net.InetAddress.getAllByName0(InetAddress.java:1280) at java.net.InetAddress.getAllByName(InetAddress.java:1192) at java.net.InetAddress.getAllByName(InetAddress.java:1126) at java.net.InetAddress.getByName(InetAddress.java:1076) at sun.security.krb5.internal.UDPClient.<init>(NetClient.java:187) at sun.security.krb5.internal.NetClient.getInstance(NetClient.java:45) at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:393) at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:364) at java.security.AccessController.doPrivileged(Native Method) at sun.security.krb5.KdcComm.send(KdcComm.java:348) at sun.security.krb5.KdcComm.sendIfPossible(KdcComm.java:253) at sun.security.krb5.KdcComm.send(KdcComm.java:229) at sun.security.krb5.KdcComm.send(KdcComm.java:200) at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:316) at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:361) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:776) ... 52 more 2018-04-17 13:46:35,636 [LeaseRenewer:solr@solr2.mycluster.com:8020] WARN [c:metricsCollection s:shard1 r:core_node6 x:metricsCollection_shard1_replica2] org.apache.hadoop.security.UserGroupInformation (UserGroupInformation.java:1127) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.

***********************************************************

Here are other details from the server:

[root@solr1 ~]# uptime 09:50:20 up 7 days, 6:22, 9 users, load average: 0.07, 0.10, 0.13

[root@solr1 ~]# cat /etc/krb5.conf

[libdefaults]

renew_lifetime = 7d

forwardable = true

default_realm = MYCLUSTER.COM

ticket_lifetime = 24h

dns_lookup_realm = false

dns_lookup_kdc = false

default_ccache_name = /tmp/krb5cc_%{uid}

#default_tgs_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5

#default_tkt_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5

[domain_realm] mycluster.com = MYCLUSTER.COM

[logging] default = FILE:/var/log/krb5kdc.log admin_server = FILE:/var/log/kadmind.log kdc = FILE:/var/log/krb5kdc.log [realms] MYCLUSTER.COM = { admin_server = ad01.mycluster.com kdc = ad01.mycluster.com }

Questions:

a) Failed on local exception: java.io.IOException: Couldn't setup connection for solr/solr1.mycluster.com@MYCLUSTER.COM to solr2.mycluster.com/172.31.16.23:8020;

NN is running on solr2 server and is the solr service on solr1 trying to connect to NN in solr2?

b) As the cluster is kerberized and renew_life time is 7 days and the I believe after 7 days the ticket got expired.

Is this the reason for exceptions in solr.log ?

Also, without restarting the service how could I resolve the issue?

Your help is of great help for me.

5 REPLIES 5

avatar
Super Collaborator

solr/solr1.mycluster.com@MYCLUSTER.COM is your Kerberos principal, and it is trying to connect to solr2.mycluster.com/172.31.16.23:8020;, which is servername/IP:Port. I really think this is pointing to your issue;

local host is: "java.net.UnknownHostException: solr1.mycluster.com: solr1.mycluster.com: System error" Here should be a server name and an IP. not an unknown host exception. Can you check your hostname setup (DNS or /etc/hosts) on the machine solr1? I.e. Open a shell and try a "ping solr1.mycluster.com"?

The ticket lifetime is 24h in your setup, and the renewal needs to take place within that period. The renew_lifetime is the maximum lifetime for the renewed ticket.

Do you get that error after aaround 24 hours or is it happening when you start the service?

avatar
Contributor

Hi,

Thanks for your time.

I got this issue after 7 days and the maximum life time as per Kerberos configuration is 7 days.

avatar
Super Collaborator

Ok, if it works for seven days, i agree that all other possible issues can be ignored for now. Are you able fix the issue by running a kinit and then restart solr?

If that enables your solr to run for another 7 days, you might want to change solr from using the ticket to using a keytab. Keytabs do not expire.

avatar
Contributor

@Harald, thanks for your kind attention on this case.

Yes, I am being forced to restart solr service.

How can I change solr from using a ticket to using a keytab?

I want my long running solr process not to be interrupted because of this expiry period.

Any help on this?

avatar
Master Mentor

@Sriram Hadoop

You will need a jaas.conf file for solr. Here is the documentation