Created 02-23-2017 05:52 AM
Following an upgrade to SOLR 6.4.1, it appears that access to HDFS via a NameService name (High Availability) is no longer working.
We have a solrconfig.xml which defines a HDFSDirectoryFactory as follows:
<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.HdfsDirectoryFactory}"> <str name="solr.hdfs.home">hdfs://XXXXHDPDEV1/data/DEV/solr</str> <bool name="solr.hdfs.blockcache.enabled">true</bool> <int name="solr.hdfs.blockcache.slab.count">32</int> <bool name="solr.hdfs.blockcache.direct.memory.allocation">true</bool> <int name="solr.hdfs.blockcache.blocksperbank">16384</int> <bool name="solr.hdfs.blockcache.read.enabled">true</bool> <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool> <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int> <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int> <str name="solr.hdfs.confdir">/etc/hadoop/conf/</str> </directoryFactory>
In this definition the value of solr.hdfs.home is hdfs://XXXXHDPDEV1/data/DEV/solr, where XXXXHDPDEV1 is the nameService name for a Hadoop cluster.
To enable this form of reference to the Hadoop cluster, we also include solr.hdfs.confdir which identifies a local directory that contains the Hadoop config files such as hdfs-site.xml These files map the nameservice name to multiple name nodes and should allow the HDFS client to discover the active name node. Using this nameservice name works fine when using command-line hdfs commands from the same SOLR server.
Under V6.4.1, when we try to create a collection based on the config that contains this solrconfig.xml file, the HDFS objects are successfully created - but the CREATE COLLECTION fails because it fails to instantiate the Update Handler, solr.DirectUpdateHandler2. We get the following traceback:
2017-02-23 11:42:16.419 ERROR (qtp225493257-77) [c:aircargo s:shard1 x:aircargo_shard1_replica1] o.a.s.c.CoreContainer Error creating core [aircargo_shard1_replica1]: SolrCore 'aircargo_shard1_replica1' is not available due to init failure: Error Instantiating Update Handler, solr.DirectUpdateHandler2 failed to instantiate org.apache.solr.update.UpdateHandler org.apache.solr.common.SolrException: SolrCore 'aircargo_shard1_replica1' is not available due to init failure: Error Instantiating Update Handler, solr.DirectUpdateHandler2 failed to instantiate org.apache.solr.update.UpdateHandler at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1151) at org.apache.solr.cloud.ZkController.publish(ZkController.java:1198) at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1372) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:885) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:827) at org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:88) at org.apache.solr.handler.admin.CoreAdminOperation$$Lambda$28/50699452.execute(Unknown Source) at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:377) at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:379) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:165) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166) at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:664) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:445) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.solr.common.SolrException: Error Instantiating Update Handler, solr.DirectUpdateHandler2 failed to instantiate org.apache.solr.update.UpdateHandler at org.apache.solr.core.SolrCore.<init>(SolrCore.java:959) at org.apache.solr.core.SolrCore.<init>(SolrCore.java:823) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:890) ... 36 more Caused by: org.apache.solr.common.SolrException: Error Instantiating Update Handler, solr.DirectUpdateHandler2 failed to instantiate org.apache.solr.update.UpdateHandler at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:767) at org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:815) at org.apache.solr.core.SolrCore.initUpdateHandler(SolrCore.java:1065) at org.apache.solr.core.SolrCore.<init>(SolrCore.java:930) ... 38 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:753) ... 41 more Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: DIBPHDPDEV1 at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.solr.update.HdfsUpdateLog.init(HdfsUpdateLog.java:145) at org.apache.solr.update.UpdateHandler.<init>(UpdateHandler.java:137) at org.apache.solr.update.UpdateHandler.<init>(UpdateHandler.java:94) at org.apache.solr.update.DirectUpdateHandler2.<init>(DirectUpdateHandler2.java:102) ... 46 more Caused by: java.net.UnknownHostException: XXXXHDPDEV1 ... 58 more
Right at the end of the above log you will see UnknownHostException: XXXXHDPDEV1. It appears that the instantiation of the update handler thinks that the hadoop nameService name is a host name. We can avoid this error by hard-coding the server address and port of the active name node (e.g. XXXXn1:8020).
e.g.
<str name="solr.hdfs.home">hdfs://XXXXn1:8020/data/DEV/solr</str>
However, in the event of a name node switch, the collection becomes inaccessible.
Is this a bug in V6.4.1? (Note. This approach worked fine in V5.3.0)
There is a very similar problem reported in HDPSearch - failed to create collection - UnknownHostExceptionl However, this is from an earlier version and was solved by fixing a problem in uploading the config to zookeeper. (The fact that we can get our config to work by hard-coding the server name, suggests that we have our zookeeper update process under control.)
Created 02-28-2017 03:39 PM
I have duplicated this problem, and filed an issue in the Solr community: https://issues.apache.org/jira/browse/SOLR-10215. I don't know what is causing it, but it seems limited to Solr 6.4. I tried the same setup with Solr 6.3.0 and it worked fine.
If you don't mind, I'd like to post a comment to that issue with a link to this forum thread to show that others have had the same problem.
Created 02-24-2017 03:29 AM
It seems like the solr.hdfs.confdir is not being used. It's unlikely to be the problem, have you checked permissions?
Created 02-28-2017 10:02 PM
Thanks @james.jones There are a series of symlinks involved but all the permissions look OK. All the directories and files are, at least, readable by all
Created 02-28-2017 03:39 PM
I have duplicated this problem, and filed an issue in the Solr community: https://issues.apache.org/jira/browse/SOLR-10215. I don't know what is causing it, but it seems limited to Solr 6.4. I tried the same setup with Solr 6.3.0 and it worked fine.
If you don't mind, I'd like to post a comment to that issue with a link to this forum thread to show that others have had the same problem.
Created 02-28-2017 09:43 PM
Thanks @Cassandra Targett
Very happy for you to include the link. I'm also happy to supply extra info and/or test a fix.
Created 03-01-2017 01:51 PM
Great, thanks @Tony Bolt. I was able to trace the cause of the problem to a seemingly unrelated commit that occurred for the 6.4.0 release, and the good news is the fix has already been committed for an upcoming 6.4.2 release. The release process for that has already started, and we'd expect it to be out within 1-2 weeks.
There is no patch to apply, but if you have the ability to build Solr from source, you could try to build locally with "branch_6_4", which is where 6.4.2 will come from, or "branch_6x", which also contains the same fix. If you can't do a local build for any reason, we do already have a 2nd confirmation that the problem is fixed with this upcoming release, so it's certainly not required or expected of you to test it at this point.
Created 03-02-2017 01:58 AM
Thanks @Cassandra Targett I am happy to wait for the 6.4.2 release. The team here are impressed with how fast this issue was resolved. Thanks for following this up for us.