From Ambari 2.4, there is a built-in service in Ambari called Ambari Infra, which is used to create infrastructure (Solr) for Ranger, Atlas or Log Search.
In case of real time indexing or sub second responses is not a must, Ambari Infra Solr collections can be configured to store their indices on HDFS. In order to achieve that you will need the following steps:
1. Create a folder on hdfs with the proper permissions:
In case of Namenode is used in HA mode, you can use the nameservice instead of the hostname in solr.hdfs.home string. Also if the cluster is not secured, solr.hdfs.security.kerberos.enabled should be false.
(Hints for Ranger: In HDP 2.5, solrconfig.xml is located at: /usr/hdp/current/ranger-admin/contrib/solr_for_audit_setup/conf/solrconfig.xml, In HDP 2.6, you can find that on Ambari as ranger-solr-configuration/content property entry)
3. Change the lock type in solrconfig.xml:
4. (Optional) Upload configuration set to ZooKeeper node of Infra Solr:
Although for Atlas/Ranger/LogSearch the Solr configuration sets are uploaded and the collections are created automatically by Ambari during service startup, you can use Infra Solr with any custom collections as well (not recommended). In that case you will need to upload the configurations first, which can be done using zkcli.sh of Solr (location: /usr/lib/ambari-infra-solr/server/scripts/cloud-scripts/zkcli.sh), also you will need to know the name of the znode of Infra Solr (which is /infra-solr by default, you can override that in infra-solr-env/infra_solr_znode Ambari config entry) and your ZooKeeper server addresses.