Created 04-19-2018 08:49 PM
Hi All,
I am seeing below warning and exception in my solr.log.
***********************************
java.io.IOException: Failed on local exception: java.io.IOException: Couldn't setup connection for solr/solr1.mycluster.com@MYCLUSTER.COM to solr2.mycluster.com/172.31.16.23:8020; Host Details : local host is: "java.net.UnknownHostException: solr1.mycluster.com: solr1.mycluster.com: System error"; destination host is: "solr2.mycluster.com":8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy10.renewLease(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:571) at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy11.renewLease(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:879) at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417) at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442) at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71) at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Couldn't setup connection for solr/solr1.mycluster.com@MYCLUSTER.COM to solr2.mycluster.com/172.31.16.23:8020 at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:672) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:643) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:730) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521) at org.apache.hadoop.ipc.Client.call(Client.java:1438) ... 16 more Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Attempt to obtain new INITIATE credentials failed! (null))] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:553) at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:368) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:722) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:718) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717) ... 19 more Caused by: GSSException: No valid credentials provided (Mechanism level: Attempt to obtain new INITIATE credentials failed! (null)) at sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:343) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:145) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192) ... 28 more Caused by: javax.security.auth.login.LoginException: ad01.mycluster.com at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:808) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at javax.security.auth.login.LoginContext.login(LoginContext.java:587) at sun.security.jgss.GSSUtil.login(GSSUtil.java:258) at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:158) at sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:335) at sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:331) at java.security.AccessController.doPrivileged(Native Method) at sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:330) ... 35 more Caused by: java.net.UnknownHostException: ad01.mycluster.com at java.net.InetAddress.getAllByName0(InetAddress.java:1280) at java.net.InetAddress.getAllByName(InetAddress.java:1192) at java.net.InetAddress.getAllByName(InetAddress.java:1126) at java.net.InetAddress.getByName(InetAddress.java:1076) at sun.security.krb5.internal.UDPClient.<init>(NetClient.java:187) at sun.security.krb5.internal.NetClient.getInstance(NetClient.java:45) at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:393) at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:364) at java.security.AccessController.doPrivileged(Native Method) at sun.security.krb5.KdcComm.send(KdcComm.java:348) at sun.security.krb5.KdcComm.sendIfPossible(KdcComm.java:253) at sun.security.krb5.KdcComm.send(KdcComm.java:229) at sun.security.krb5.KdcComm.send(KdcComm.java:200) at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:316) at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:361) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:776) ... 52 more 2018-04-17 13:46:35,636 [LeaseRenewer:solr@solr2.mycluster.com:8020] WARN [c:metricsCollection s:shard1 r:core_node6 x:metricsCollection_shard1_replica2] org.apache.hadoop.security.UserGroupInformation (UserGroupInformation.java:1127) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
***********************************************************
Here are other details from the server:
[root@solr1 ~]# uptime 09:50:20 up 7 days, 6:22, 9 users, load average: 0.07, 0.10, 0.13
[root@solr1 ~]# cat /etc/krb5.conf
[libdefaults]
renew_lifetime = 7d
forwardable = true
default_realm = MYCLUSTER.COM
ticket_lifetime = 24h
dns_lookup_realm = false
dns_lookup_kdc = false
default_ccache_name = /tmp/krb5cc_%{uid}
#default_tgs_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5
#default_tkt_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5
[domain_realm] mycluster.com = MYCLUSTER.COM
[logging] default = FILE:/var/log/krb5kdc.log admin_server = FILE:/var/log/kadmind.log kdc = FILE:/var/log/krb5kdc.log [realms] MYCLUSTER.COM = { admin_server = ad01.mycluster.com kdc = ad01.mycluster.com }
Questions:
a) Failed on local exception: java.io.IOException: Couldn't setup connection for solr/solr1.mycluster.com@MYCLUSTER.COM to solr2.mycluster.com/172.31.16.23:8020;
NN is running on solr2 server and is the solr service on solr1 trying to connect to NN in solr2?
b) As the cluster is kerberized and renew_life time is 7 days and the I believe after 7 days the ticket got expired.
Is this the reason for exceptions in solr.log ?
Also, without restarting the service how could I resolve the issue?
Your help is of great help for me.
Created 04-19-2018 10:21 PM
solr/solr1.mycluster.com@MYCLUSTER.COM is your Kerberos principal, and it is trying to connect to solr2.mycluster.com/172.31.16.23:8020;, which is servername/IP:Port. I really think this is pointing to your issue;
local host is: "java.net.UnknownHostException: solr1.mycluster.com: solr1.mycluster.com: System error" Here should be a server name and an IP. not an unknown host exception. Can you check your hostname setup (DNS or /etc/hosts) on the machine solr1? I.e. Open a shell and try a "ping solr1.mycluster.com"?
The ticket lifetime is 24h in your setup, and the renewal needs to take place within that period. The renew_lifetime is the maximum lifetime for the renewed ticket.
Do you get that error after aaround 24 hours or is it happening when you start the service?
Created 04-20-2018 01:28 AM
Hi,
Thanks for your time.
I got this issue after 7 days and the maximum life time as per Kerberos configuration is 7 days.
Created 04-20-2018 08:46 AM
Ok, if it works for seven days, i agree that all other possible issues can be ignored for now. Are you able fix the issue by running a kinit and then restart solr?
If that enables your solr to run for another 7 days, you might want to change solr from using the ticket to using a keytab. Keytabs do not expire.
Created 04-20-2018 08:48 AM
@Harald, thanks for your kind attention on this case.
Yes, I am being forced to restart solr service.
How can I change solr from using a ticket to using a keytab?
I want my long running solr process not to be interrupted because of this expiry period.
Any help on this?
Created 04-20-2018 09:07 AM
You will need a jaas.conf file for solr. Here is the documentation