Community Articles

Find and share helpful community-sourced technical articles.
avatar
Contributor

From Ambari 2.4, there is a built-in service in Ambari called Ambari Infra, which is used to create infrastructure (Solr) for Ranger, Atlas or Log Search.

In case of real time indexing or sub second responses is not a must, Ambari Infra Solr collections can be configured to store their indices on HDFS. In order to achieve that you will need the following steps:

1. Create a folder on hdfs with the proper permissions:

hdfs dfs -mkdir /user/infra-solr
hdfs dfs -chown infra-solr:hdfs /user/infra-solr

2. Edit solrconfig.xml of your collection where you want to use HDFS to store the indices, change the following section:

<directoryFactory name="DirectoryFactory"
   class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}">
</directoryFactory>

to:

<directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
  <str name="solr.hdfs.home">hdfs://<namenode_host>:8020/user/infra-solr</str>
  <str name="solr.hdfs.confdir">/etc/hadoop/conf</str>
  <bool name="solr.hdfs.blockcache.enabled">true</bool>
  <int name="solr.hdfs.blockcache.slab.count">1</int>
  <bool name="solr.hdfs.blockcache.direct.memory.allocation">false</bool>
  <int name="solr.hdfs.blockcache.blocksperbank">16384</int>
  <bool name="solr.hdfs.blockcache.read.enabled">true</bool>
  <bool name="solr.hdfs.blockcache.write.enabled">false</bool>
  <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool>
  <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int>
  <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int>
  <bool name="solr.hdfs.security.kerberos.enabled">true</bool>
  <str name="solr.hdfs.security.kerberos.keytabfile">/etc/security/keytabs/ambari-infra-solr.service.keytab</str>
  <str name="solr.hdfs.security.kerberos.principal">infra-solr/<hostname>@EXAMPLE.COM</str>
</directoryFactory>

In case of Namenode is used in HA mode, you can use the nameservice instead of the hostname in solr.hdfs.home string. Also if the cluster is not secured, solr.hdfs.security.kerberos.enabled should be false.

(Hints for Ranger: In HDP 2.5, solrconfig.xml is located at: /usr/hdp/current/ranger-admin/contrib/solr_for_audit_setup/conf/solrconfig.xml, In HDP 2.6, you can find that on Ambari as ranger-solr-configuration/content property entry)

3. Change the lock type in solrconfig.xml:

<lockType>hdfs</lockType>

4. (Optional) Upload configuration set to ZooKeeper node of Infra Solr:

Although for Atlas/Ranger/LogSearch the Solr configuration sets are uploaded and the collections are created automatically by Ambari during service startup, you can use Infra Solr with any custom collections as well (not recommended). In that case you will need to upload the configurations first, which can be done using zkcli.sh of Solr (location: /usr/lib/ambari-infra-solr/server/scripts/cloud-scripts/zkcli.sh), also you will need to know the name of the znode of Infra Solr (which is /infra-solr by default, you can override that in infra-solr-env/infra_solr_znode Ambari config entry) and your ZooKeeper server addresses.

/usr/lib/ambari-infra-solr/server/scripts/cloud-scripts/zkcli.sh -zkhost <zookeeper_server_adress>:2181/infra-solr -cmd upconfig -confdir /path/collection1/myconf -confname myconf

5. (Optional) Create Solr collection:

After your config set is uploaded to the znode, you can create your collection based on that:

curl "http://<mysolr_url>/solr/admin/collections?action=CREATE&name=mycollectionName&numShards=3&replicationFactor=2&collection.configName=myconf&maxShardsPerNode=100"

(Use "--negotiate -u :" for curl, in case of Infra Solr is kerberized)

7,982 Views
Comments
avatar
Rising Star

Thanks for the instructions. I have a HDP 2.5 cluster and want to move or create all the collection configuration to HDFS directory, instead of local disk. The config you have above is to update the solrconfig.xml for each collection and this works, but is there a way to update the entire thing from Ambari Console by updating the infra-solr-env-template? Thanks in advance for your input.

avatar

How would this be set when you have multiple solr hosts?

<strname="solr.hdfs.security.kerberos.principal">infra-solr/<hostname>@EXAMPLE.COM</str>
avatar
Contributor

sorry for late response, i did not notice.
probably you need to extend SOLR_OPTS with -Dsolr.hdfs.security.kerberos.principal= ... // from infra-solr-env template, you can use `hostname -f` there, then use it like that in the xml

<strname="solr.hdfs.security.kerberos.principal">${solr.hdfs.security.kerberos.principal:}</str>
avatar
Contributor

sorry for late response.
https://cwiki.apache.org/confluence/display/AMBARI/Modify+configurations
usage of configs.py can be useful here