Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDPSearch - failed to create collection - UnknownHostException

avatar
Rising Star

Hello,

I am trying to setup and configure HDPSearch. I have 4 solr boxes running 6 instances of solr. I have setup HDFS with NN HA. All 4 boxes can successfully reach HDFS using the NN HA name.

However, I am receiving the below error when trying to create a collection in solr. What is solr missing that it can't connect to HDFS?

126330 ERROR (qtp59559151-22) [c:collection s:shard23 r:core_node86 x:collection_shard23_replica3] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error CREATEing SolrCore 'collection_shard23_replica3': Unable to create core [collection_shard23_replica3] Caused by: NN_HA_Name.
.. 31 more 
Caused by: java.net.UnknownHostException: NN_HA_Name
... 45 more

Here is the command to start solr cloud:

solr -c -p 8983 -z $zk_quorum:2181/solr -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs-Dsolr.hdfs.home=hdfs://NN_HA_Name/apps/solr

Here is the command to create the collection:

solr create -c collection -d collection -n collection -s 48 -rf 3

Here are my solrconfig.xml DirectoryFactory Settings:

   <directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
      <str name="solr.hdfs.home">hdfs://NN_HA_Name/apps/solr</str>
      <str name="solr.hdfs.confdir">/etc/hadoop/conf</str>
      <bool name="solr.hdfs.blockcache.enabled">true</bool>
      <int name="solr.hdfs.blockcache.slab.count">1</int>
      <bool name="solr.hdfs.blockcache.direct.memory.allocation">true</bool>
      <int name="solr.hdfs.blockcache.blocksperbank">16384</int>
      <bool name="solr.hdfs.blockcache.read.enabled">true</bool>
      <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool>
      <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int>
      <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int>
    </directoryFactory>

I have installed the hdfs clients on the solr nodes and can successfully

hdfs dfs -ls hdfs://NN_HA_Name/apps/solr

I also see core-site.xml and hdfs-site.xml (with the correct NN configurations) in the /etc/hadoop/conf directory.

Thanks, Jon

1 ACCEPTED SOLUTION

avatar
Rising Star

After more digging, I discovered the solrconfig.xml in ZK was not the correct version. I did a series of downconfig and upconfig to load the correct configs and verify everything is OK. After loading the correct solrconfig.xml and restarting each solr node, the create collection command succeeded.

/opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -cmd downconfig -d collection -z $zk_quorum:2181/solr -n collection
/opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -d $path_to_configs -z $zk_quorum:2181/solr -n collection

View solution in original post

7 REPLIES 7

avatar
Master Guru

@Jon Maestas Are you simply testing solr? as a general practice for production I would not use hdfs with solr. I just SAS/SSD DAS storage and point local directories. Create replicas (3x) across your solr nodes. For your issue do you mind attaching log file?

avatar
Rising Star

Hi @Sunile Manjee,

Thank you for your response. This is the documentation I followed to setup this environment: https://doc.lucidworks.com/lucidworks-hdpsearch/2.3/Guide-Install.html

I will be testing performance against HDFS indexing with NRT setup. I have local SSD disks setup as a fallback if this isn't fast enough or too unreliable.

Thanks,

Jon

avatar
Rising Star

After more digging, I discovered the solrconfig.xml in ZK was not the correct version. I did a series of downconfig and upconfig to load the correct configs and verify everything is OK. After loading the correct solrconfig.xml and restarting each solr node, the create collection command succeeded.

/opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -cmd downconfig -d collection -z $zk_quorum:2181/solr -n collection
/opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -d $path_to_configs -z $zk_quorum:2181/solr -n collection

avatar
Master Guru

@Jon Maestas thanks for sharing. good stuff.

avatar
Contributor

@Jon Maestas I have hit this problem too, but simply re-upconfig'ing did not fix the issue: I get an unknownHostException on the Nameservice name specified in solr.hdfs.home. Did you get any further insight into what was going wrong and why re-executing the upconfig did the trick? (I have to say that the version of solrconfig.xml in zookeeper looks identical to my source version.)

Regards, Tony

avatar
Rising Star
@Tony Bolt

After you do the downconfig, do your configs look correct? If you're not upconfig'ing them to the correct location in ZK, solr won't see the correct version of your configs.

Also, check in the ZK CLI to make sure you're using the right znode. If you're znode isn't /solr, then you'll need to adjust the above commands appropriately. And make sure solr is looking in the right znode.

I believe my znode was /solr and my configs were in /solr/configs.

avatar
Contributor

@Jon Maestas

I'm still having the same problem. I've tried clearing the config and upconfiging multiple times. In every instance the solrconfig.xml looks fine from the Solr UI.

The HDS stuff seems to be working OK. i.e. when I create the collection, the expected directories and files are created in HDFS. It is only after that, when SOLR tries to instantiate the updateHandler that we get the unKnownHostException refering to our HDFS Nameservice Name.

Unfortuanatley we changed multiple things going in here. Everything was working fine on Solr Version 5.3.1 and the embedded zookeeper. This problem has arisen when we went to SOLR 6.4.1 but we simultaneously switched to using the Hadoop cluster's existing zookeeper quorum.

We have the /solr chroot setup in Zookeeper and it is referenced consistently across all the Solr config files and commands. Our next step is to start backing out our changes (which is a pain becuase we want some of the security enhancements in 6.4.1

In your examples (above) you use $zk_quorum. IS that set to the name of a single zookeeper node (or is it a list of all the nodes) I've tried both approaches but it doesn't make any difference.

Thanks

Tony