Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
avatar
New Contributor

Here is a high level overview of how to deploy Solr if approached cautiously.

  1. Requirements
  2. Install
  3. Edit Configuration files to work with HDFS
  4. Changing to ‘Solr’ user
  5. Adding Nodes
  6. Starting SolrCloud
  7. Creating a collection
  8. Verify & Enjoy

#Requirements for this guide

Installation on a Linux server.

Solr 5.5.0 requires Java 1.7 and higher.

HDP 2.3 or 2.4

Apache ZooKeeper — HDP 2.3 or 2.4 and Solr both use Apache ZooKeeper to manage services for the cluster. The ZooKeeper ensemble that you are using for HDP 2.3 or 2.4 can also be used by Solr.

This guide doesn’t include Kerberos requirements.

#Install Lucidworks-HDPsearch package

yum install lucidworks-hdpsearch

After installation, the HDPSearch files will be found in this directory.

/opt/lucidworks-hdpsearch 
Install only — DO NOT START SOLR SERVICE YET
The Lucidworks HDP Search package should be installed manually on each node of the cluster. Follow this guide on 1 machine first. Then go and start service on each node in cluster (See Adding nodes section below)

#Notes on Config files for HDFS

When using Solr with HDPSearch, you should run Solr in SolrCloud mode. This mode is set when starting Solr.

It provides central configuration for a cluster of Solr servers, automatic load balancing and fail-over for queries, and distributed index replication. When setting up SolrCloud, you need to make some modifications to some Solr-specific configuration files before starting Solr.

The configuration files are uploaded to ZooKeeper and managed centrally for all Solr nodes. The config files are located in:

/opt/lucidworks-hdpsearch/solr/server/solr/configsets
Personal tip: I choose the ‘data_driven_schema_configs’ because it allows Solr to create a schema on the fly without you first having to define it. If you have a specific schema in mind, choose from the other two configs.

The following changes only need to be completed for the first Solr node that is started. After the first node is running, all additional nodes will get their configuration information from ZooKeeper.

#Modifing Config files for HDFS

Solr’s <directoryFactory>, found in solrconfig.xml, defines how indexes will be stored on disk.

Find the solrconfig.xml file in the configset you will customize for your first collection. Within that file, find the section for <directoryFactory>. It will most likely look like this:

<directoryFactory name="DirectoryFactory"
 class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}”>
 </directoryFactory>

We will want to replace this with a different class, and define several additional properties.

<directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
 <str name="solr.hdfs.home">hdfs://<host:port>/user/solr</str>
 <str name="solr.hdfs.confdir">/etc/hadoop/conf</str>
 <bool name="solr.hdfs.blockcache.enabled">true</bool>
 <int name="solr.hdfs.blockcache.slab.count">1</int>
 <bool name="solr.hdfs.blockcache.direct.memory.allocation">true</bool>
 <int name="solr.hdfs.blockcache.blocksperbank">16384</int>
 <bool name="solr.hdfs.blockcache.read.enabled">true</bool>
 <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool>
 <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int>
 <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int>
 </directoryFactory>

You can copy paste the whole code block and replace the existing code.The only lines that you will need to change are:

“solr.hdfs.home — This is the address of the directory where your indexes will be stored on the HDFS file browser.

hdfs://: keep exactly same
Hostname: full name of your name node on the cluster. Eg. Namenode01.company.com
Port: this is the port for the rpc address of your namenode. It can be found in Ambari under HDFS>settings>Advanced> . Default is usually 8020
/user/solr: this is the directory under which your solr indexes will be saved. This is the path on your HDFS file browser. You may define it differently if you desire.

*optional*

If you have issues with *Direct Memory buffer, after you complete the rest of the installation, you can return to this config file and change “solr.hdfs.blockcache.enabled=true TO false. This is stop the caching on your SolrCloud and may reduce performance. If you change this configuration — You will have to transmit these changes to Zookeeper. See ConfigAPI

If your cluster uses Kerberos, please see additional information at the linkStarting Solr with Kerberos before starting Solr.

# Changing to Solr user

Solr installation creates & gives permissions to a user named ‘solr’ (all lower case). It is critical that you change from root/any other user to solr before starting Solr. On the command line -

Sudo su — solr
(input password)

#Starting Solr

Navigate to

/opt/lucidworks-hdpsearch/solr/

To start Solr, the script is quite simple, such as:

bin/solr start -c (1)
 -z 10.0.0.1:2181,10.0.0.2:2181,10.0.0.3:2181 (2)
 -Dsolr.directoryFactory=HdfsDirectoryFactory (3)
 -Dsolr.lock.type=hdfs (4)
 -Dsolr.hdfs.home=hdfs://host:port/path (5)

The start command for the bin/solr script.

(1) The -c parameter tells Solr to start in SolrCloud mode.
(2) The connect string for the ZooKeeper ensemble. These can be found in Ambari under zookeeper> settings. Default port is 2181. We give the addresses for each node of the ZooKeeper ensemble in case one is down; we will still be able to connect as long as there is a quorum.
(3) The Solr index implementation you will use; this parameter defines how the indexes are stored on disk. In this case, we are telling Solr all indexes should be stored in HDFS.
(4) The index lock type to use. Again, we have defined hdfs to indicate the indexes will be stored in HDFS.
(5) The path to the location of the Solr indexes in HDFS. This is the same path defined in the config file above. If you changed the directory ‘/user/solr’, make sure this reflects your custom path.

IMPORTANT NOTE: If you do not specify a ZooKeeper connect string with the -z property, Solr will launch its embedded ZooKeeper instance. This instance has a single ZooKeeper instance, so provides no failover and is not meant for production use.

Note we have not defined a collection name, a configuration set, how many shards or nodes we want, etc. Those properties are defined at the collection level, and we’ll define those when we create a collection.

#Adding nodes to Solr

Once you have configured the settings and uploaded to Zookeeper. You can now add the other nodes to Solr and they will automatically get configured based on the settings stored in zookeeper.

Navigate to the node —

ssh <nodeName>
yum install lucidworks-hdpsearch

Change to solr user — see above.

Repeat #Starting Solr section above. The command & settings will be exactly the same as before so you can copy/paste.

Repeat for each node.

#Creating your first collection

We can use the same bin/solr script to create and delete collections. This time we use it with the create command and define several new properties.

bin/solr create -c SolrCollection (1)
 -d data_driven_schema_configs (2)
 -n mySolrConfigs (3)
 -s 2 (4)
 -rf 2 (5)
(1) The create command for the bin/solr script. In this case, the -c parameter provides the name of the collection to create.
(2) The configset to use. In this case, we’ve used the data_driven_schema_configs configset. If you modified a configset to support storing Solr indexes HDFS, as above, you should instead use the name of the configset you modified.
(3) This will be the name of the configset uploaded to ZooKeeper. This allows the same configset to be reused and very similar configsets to be differentiated easily.
(4) The number of shards to split the collection into. The shards are physical sections of the collection’s index on nodes of the cluster.
(5) The number of replicas of each shard for the collection. Replicas are copies of the index which are used for failover and backup in case of failure of one of the main shards.

#Verify & Enjoy!

Navigate to

<hostname:8983>/solr
8983 is the default port for solr; If above doesn’t work check to verify your solr port & firewall settings.

Click on Cloud in the left navigation pane.

You should see a visual of your collection with 2 shards & 2 replicas in green.

And that’s it!

Once again, this is meant to be a more detailed guide based on the Lucidworks install guide. I wrote this to help you avoid some pitfalls and share notes from my experience. This is by no means comprehensive and especially doesn’t include security integration.

If you have any questions or are getting funky errors feel free to leave a comment, or reach out to me on twitter @imharshj.

11,784 Views
Comments
@Harsh Jain

I am getting following error so Can you please help me to get it resolved.

solr@m1 solr]$ bin/solr create -c SolrCollection -d /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/ -n mySolrConfigs -s 2 -rf 2

Connecting to ZooKeeper at m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181

Re-using existing configuration directory mySolrConfigs

Creating new collection 'SolrCollection' using command:

http://192.168.56.41:8983/solr/admin/collections?action=CREATE&name=SolrCollection&numShards=2&repli...

{

"responseHeader":{

"status":0,

"QTime":1299},

"failure":{"":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error from server at http://192.168.56.41:8983/solr: Error CREATEing SolrCore 'SolrCollection_shard2_replica1': Unable to create core [SolrCollection_shard2_replica1] Caused by: Open quote is expected for attribute \"{1}\" associated with an element type \"name\"."}}

avatar
New Contributor

@Saurabh Kumar - I ran into this issue too when following this guide. Most likely you did a copy+paste of xml config settings which contains invalid quotes (”). You'll want to replace those quotes with correctly typed ones from your keyboard.

Unfortunately, you cannot simply go through the steps again from the beginning I found, you have to use the zookeeper CLI to update the files for the config set.

Here is what I did:

1) download the files already in zookeeper to a temp directory to verify the quotes are still a problem

server/scripts/cloud-scripts/zkcli.sh -cmd downconfig -z <ZOOKEPERHOST>:2181 -n mySolrConfigs -d server/solr/temp

2) fix the quotes in the original file using vi:

vi /server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml

3) upload the whole conf folder with fixed config file:

server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -z <ZOOKEPERHOST>:2181 -n mySolrConfigs -d server/solr/configsets/data_driven_schema_configs/conf

After that is done, you should be able to create your SolrCollection without error.

@Saurabh Kumar @Nathan Mott's conclusion and solution above is correct. Thanks!

Fixed the invalid quotes for future use. (8.5.16)

Dear Harsh,

Great article!

I have a kerberized HDFS. Any ideas on how to setup Solr directories to HDFS with kerberos?

Many thanks!

I'm sure with Kerberos in the picture. However, I suspect since the user 'solr' user is writing to HDFS, with kerberos you might have to ensure 'solr' has the right access. This might help!

http://www.cloudera.com/documentation/archive/search/1-0-0/Cloudera-Search-Installation-Guide/csig_c...

avatar
Expert Contributor

Hi,

I was getting the following error for examples:

Error CREATEing SolrCore 'twitter_demo_shard1_replica1': Unable to create core [twitter_demo_shard1_replica1] Caused by: You must set the HdfsDirectoryFactory param solr.hdfs.home for relative dataDir paths to work

First I couldn't find where sold.hdfs.home was placed. Then I found this article and updated the solrConfig.xml file.

Then I tried again to view existing examples but I still get this error.

I checked the HDFS directory and confirmed that the path is there.

Does anybody has a suggestion? Or idea?