Created 01-11-2016 05:56 PM
Hello,
I am running HDP Search's Solr Cloud on HDP 2.3 with it configured to store its index files on a Kerberized HDFS. When Solr is started it is able to write index files correctly to HDFS, however, after 24 hours have elapsed Solr becomes unable to connect to HDFS as it says it doesn't have a valid Kerberos tgt anymore (my default Kerberos ticket lifetime is 24 hours).
A restart of Solr Cloud resolves the issue again for another 24 hours, so it appears that Solr is not renewing its Kerberos ticket itself when it expires. Could this be an issue with Solr? Or is there configuration I can add to get Solr to automatically renew its ticket?
This is stack trace I'm getting in the Solr log:
java.io.IOException: Failed on local exception: java.io.IOException: Couldn't setup connection for solr/sandbox.hortonworks.com@HORTONWORKS.COM to sandbox.hortonworks.com/10.0.2.15:8020; Host Details : local host is: "sandbox.hortonworks.com/10.0.2.15"; destination host is: "sandbox.hortonworks.com":8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy10.renewLease(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:571) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy11.renewLease(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:879) at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417) at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442) at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71) at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Couldn't setup connection for solr/sandbox.hortonworks.com@HORTONWORKS.COM to sandbox.hortonworks.com/10.0.2.15:8020 at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:672) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:643) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:730) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521) at org.apache.hadoop.ipc.Client.call(Client.java:1438) ... 16 more Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:553) at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:368) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:722) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:718) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717) ... 19 more Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:121) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:223) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:193) ... 28 more
This is the configuration I'm using for my collection:
<directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory"> <str name="solr.hdfs.home">hdfs://sandbox.hortonworks.com/user/solr</str> <str name="solr.hdfs.confdir">/usr/hdp/current/hadoop-client/conf</str> <bool name="solr.hdfs.blockcache.enabled">true</bool> <int name="solr.hdfs.blockcache.slab.count">1</int> <bool name="solr.hdfs.blockcache.direct.memory.allocation">false</bool> <int name="solr.hdfs.blockcache.blocksperbank">16384</int> <bool name="solr.hdfs.blockcache.read.enabled">true</bool> <bool name="solr.hdfs.blockcache.write.enabled">false</bool> <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool> <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int> <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int> <bool name="solr.hdfs.security.kerberos.enabled">true</bool> <str name="solr.hdfs.security.kerberos.keytabfile">/etc/solr/conf/solr.keytab</str> <str name="solr.hdfs.security.kerberos.principal">solr/sandbox.hortonworks.com@HORTONWORKS.COM</str> </directoryFactory>
Created 01-11-2016 07:08 PM
The Hadoop RPC client is coded to re-login from the keytab automatically if it detects an RPC call has failed due to a SASL authentication failure. There is no requirement for special configuration or for the applications (Solr in this case) to write special code to trigger this re-login. I recently wrote a detailed description of this behavior on Stack Overflow.
If this is not working in your environment, and you start seeing authentication failures after a process runs 24 hours, then I recommend reviewing Apache JIRA HADOOP-10786. This was a bug that impacted the automatic re-login from keytab on certain JDK versions. On the JDK 7 line, I know the problem was introduced in JDK 1.7.0_80. On the JDK 8 line, I'm not certain which exact JDK release introduced the problem.
If after reviewing HADOOP-10786 you suspect this is the root cause, then you can fix it by either downgrading the JDK to 1.7.0_79 or upgrading Hadoop. The HADOOP-10786 patch changes the Hadoop code so that it will work correctly with all known JDK versions. For Apache Hadoop, the fix shipped in version 2.6.1 and 2.7.0. For HDP, the fix shipped in version 2.2.8.0 and 2.3.0.0. All subsequent versions would have the fix too.
Created 01-11-2016 07:08 PM
The Hadoop RPC client is coded to re-login from the keytab automatically if it detects an RPC call has failed due to a SASL authentication failure. There is no requirement for special configuration or for the applications (Solr in this case) to write special code to trigger this re-login. I recently wrote a detailed description of this behavior on Stack Overflow.
If this is not working in your environment, and you start seeing authentication failures after a process runs 24 hours, then I recommend reviewing Apache JIRA HADOOP-10786. This was a bug that impacted the automatic re-login from keytab on certain JDK versions. On the JDK 7 line, I know the problem was introduced in JDK 1.7.0_80. On the JDK 8 line, I'm not certain which exact JDK release introduced the problem.
If after reviewing HADOOP-10786 you suspect this is the root cause, then you can fix it by either downgrading the JDK to 1.7.0_79 or upgrading Hadoop. The HADOOP-10786 patch changes the Hadoop code so that it will work correctly with all known JDK versions. For Apache Hadoop, the fix shipped in version 2.6.1 and 2.7.0. For HDP, the fix shipped in version 2.2.8.0 and 2.3.0.0. All subsequent versions would have the fix too.
Created 01-19-2016 03:36 PM
Thanks Chris, changing Solr's JVM to 1.7.0_79 seems to have done the trick.
Created 12-13-2016 08:31 PM
@Andrew Bumstead opened SOLR-8538. This was resolved as not a problem since JDK 1.7.0_79 fixed the issue.
If you use JDK 8 and Solr <6.2.0 you will most likely have to manually update hadoop-commons to 2.6.1+. The packaged Solr 5.5.0 and 5.5.1 for HDP 2.3, 2.4, and 2.5 seem to have the original packaged Hadoop 2.6.0 dependencies. If you don't upgrade the Hadoop dependencies under Solr to 2.6.1+, you will most likely get Kerberos ticket renewal issues or you have to follow the steps outlined below by @Jonas Straub to enable full Kerberos authentication.
Created 01-12-2016 09:06 AM
Renewing Kerberos tickets does not work in the Solr Standalone (single node) solution at the moment. There is a bug in Solr that prevents the renewal of Kerberos tickets (I have an open ticket regarding this issue).
However the SolrCloud solution should renew its tickets. Below is a sample configuration that will enable Kerberos for SolrCloud and ensure that Solr Kerberos tickets are renewed. (Note: this will also enable Kerberos authentication for the SolrAdmin UI).
/opt/lucidworks-hdpsearch/solr/bin/jaas.conf
Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="/etc/security/keytabs/solr.headless.keytab" storeKey=true debug=true principal="solr-mycluster@EXAMPLE.COM"; };
/opt/lucidworks-hdpsearch/solr/bin/solr.in.sh
SOLR_HOST=`hostname -f` ZK_HOST="zookeeperA.example.com:2181,zookeeperB.example.com:2181,zookeeperC.example.com:2181/solr" SOLR_KERB_PRINCIPAL=HTTP/${SOLR_HOST}@EXAMPLE.COM SOLR_KERB_KEYTAB=/etc/security/keytabs/solr-spnego.service.keytab SOLR_JAAS_FILE=/opt/lucidworks-hdpsearch/solr/bin/jaas.conf SOLR_AUTHENTICATION_CLIENT_CONFIGURER=org.apache.solr.client.solrj.impl.Krb5HttpClientConfigurer SOLR_AUTHENTICATION_OPTS=" -DauthenticationPlugin=org.apache.solr.security.KerberosPlugin -Djava.security.auth.login.config=${SOLR_JAAS_FILE} -Dsolr.kerberos.principal=${SOLR_KERB_PRINCIPAL} -Dsolr.kerberos.keytab=${SOLR_KERB_KEYTAB} -Dsolr.kerberos.cookie.domain=${SOLR_HOST} -Dhost=${SOLR_HOST} -Dsolr.kerberos.name.rules=RULE:[1:$1@$0](solr-mycluster@EXAMPLE.COM)s/.*/solr/DEFAULT"
/etc/security/keytabs/solr.headless.keytab
Keytab file for the solr-mycluster@EXAMPLE.COM principal
Add the following file and znode to Zookeeper:
/opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost zookeeperA.example.com:2181,zookeeperB.example.com:2181,zookeeperC.example.com:2181 -cmd makepath /solr /opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost zookeeperA.example.com:2181,zookeeperB.example.com:2181,zookeeperC.example.com:2181 -cmd put /solr/security.json '{"authentication":{"class": "org.apache.solr.security.KerberosPlugin"}}'
Solrconfig.xml
Your solrconfig.xml looks fine.
<directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory"> <str name="solr.hdfs.home">hdfs://mycluster/solr</str> <str name="solr.hdfs.confdir">/etc/hadoop/conf</str> <bool name="solr.hdfs.security.kerberos.enabled">true</bool> <str name="solr.hdfs.security.kerberos.keytabfile">/etc/security/keytabs/solr.headless.keytab</str> <str name="solr.hdfs.security.kerberos.principal">solr-mycluster@EXAMPLE.COM</str> <bool name="solr.hdfs.blockcache.enabled">true</bool> <int name="solr.hdfs.blockcache.slab.count">1</int> <bool name="solr.hdfs.blockcache.direct.memory.allocation">true</bool> <int name="solr.hdfs.blockcache.blocksperbank">16384</int> <bool name="solr.hdfs.blockcache.read.enabled">true</bool> <bool name="solr.hdfs.blockcache.write.enabled">true</bool> <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool> <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">64</int> <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">512</int> </directoryFactory>
You can find some more information on this page => https://cwiki.apache.org/confluence/display/RANGER/How+to+configure+Solr+Cloud+with+Kerberos+for+Ran...