Support Questions

Find answers, ask questions, and share your expertise

Configuring SOLR as the Indexing Backend for the Graph Repository

avatar
Rising Star

I am trying to install a basic instance of Atlas with embedded HBase and Solr.

I am following the installation guide: http://atlas.incubator.apache.org/InstallationSteps.html

In the section titled: Configuring SOLR as the Indexing Backend for the Graph Repository

I can confirm I have two solr nodes running in the solrcloud.

For the following step:

"first copy the required configuration files from ATLAS_HOME/conf/solr on the ATLAS instance host to the Solr instance host. SOLR_CONF in the below mentioned commands refer to the directory where the solr configuration files have been copied to on Solr host"

I have the following configuration files in the ATLAS_HOME/conf:

currency.xml, protwords.txt, solrconfig.xml, synonyms.txt, lang, schema.xml and stopwords.txt

Do I need to copy all of the files above? It does not mention any specific files to copy. Any help will be highly appreciated.

1 ACCEPTED SOLUTION

avatar
Super Collaborator
@Bilal Arshad

All of the files in ATLAS_HOME/conf/solr are probably needed in a directory on the Solr host so that when you run the solr create command it will upload those files into zookeeper for Solr to access for that collection (no matter which host Solr is running on).

The files in this directory are as follows (from the link you provided)

   |- solr
      |- currency.xml
      |- lang
         |- stopwords_en.txt
      |- protowords.txt
      |- schema.xml
      |- solrconfig.xml
      |- stopwords.txt
      |- synonyms.txt

solrconfig.xml is has configuration parameters such as what rest endpoints are available for the collection.

schema.xml describes the fields and how they are handled (indexed, stored, and so forth).

The other files are used by the schema.xml (assuming they are used) as they should be listed/referenced in the schema.xml. So, for this collection, I assume the following are referenced from the schema:

synonyms.txt --- listing words that can be searched and considered equivalent (e.g. car and automobile)

stopwords.txt -- listing highly common words that will not be indexed such as "the" and "a"

protowords.xml -- listing words that should not be stemmed (broken into equivalent root words)

stopwords_en.txt -- same as stopwords above but specific for English.

currency.xml -- money exchange rates

Even if all of these files are not used, it shouldn't hurt anything including them.

View solution in original post

3 REPLIES 3

avatar
Super Collaborator

@Bilal The XML and text files you mentioned would be in ATLAS_HOME/conf/solr folder. You may copy the complete directory to Solr instance and use the directory while creating indices. For example , once you copy the ATLAS_HOME/conf/solr directory to SOLR_HOME/solr , you may do $SOLR_BIN/solr create -c vertex_index -d SOLR_HOME/solr -shards #numShards -replicationFactor #replicationFactor

avatar
Super Collaborator
@Bilal Arshad

All of the files in ATLAS_HOME/conf/solr are probably needed in a directory on the Solr host so that when you run the solr create command it will upload those files into zookeeper for Solr to access for that collection (no matter which host Solr is running on).

The files in this directory are as follows (from the link you provided)

   |- solr
      |- currency.xml
      |- lang
         |- stopwords_en.txt
      |- protowords.txt
      |- schema.xml
      |- solrconfig.xml
      |- stopwords.txt
      |- synonyms.txt

solrconfig.xml is has configuration parameters such as what rest endpoints are available for the collection.

schema.xml describes the fields and how they are handled (indexed, stored, and so forth).

The other files are used by the schema.xml (assuming they are used) as they should be listed/referenced in the schema.xml. So, for this collection, I assume the following are referenced from the schema:

synonyms.txt --- listing words that can be searched and considered equivalent (e.g. car and automobile)

stopwords.txt -- listing highly common words that will not be indexed such as "the" and "a"

protowords.xml -- listing words that should not be stemmed (broken into equivalent root words)

stopwords_en.txt -- same as stopwords above but specific for English.

currency.xml -- money exchange rates

Even if all of these files are not used, it shouldn't hurt anything including them.

avatar
Rising Star

@ssainath & @james.jones thank you for your prompt replies!