Community Articles

Find and share helpful community-sourced technical articles.
avatar
Expert Contributor

As we know many services like Atlas for lineage, Ranger for audit logs, log search and so on uses Ambari Infra (Solr) for indexing data. So moving Ambari Infra in production and keeping it stable and up is really important. This are the key points I came up with to make this happen

Hardware –

Try to have minimum of 3 Ambari infra nodes with atleast 1-2TB disk for Solr data storage, but mainly depends on how many components ( like Ranger, Atlas , Log search.. ) and amount of data will feed into Solr of indexing. A major driving factor for Solr performance is RAM. Solr requires sufficient memory for two separate things: One is the Java heap, the other is free memory for the OS disk cache. Let's say that you have a Solr index size of 8GB. If your OS, Solr's Java heap, and all other running programs require 4GB of memory, then an ideal memory size for that server is at least 12GB. So How much memory do I need for Ambari Infra?? This is one of those questions that has no generic answer. You want a heap that's large enough so that you don't have OOM exceptions and problems with constant garbage collection, but small enough that you're not wasting memory or running into huge garbage collection pauses. So ideally we can start with 8GB total memory (leaving 4GB for disk cache)initially, but that also might NOT be enough. The really important thing is to ensure that there is a high cache hit ratio on the OS disk cache.

GC -

GC pauses usually caused by full garbage collections i.e pause all program execution to clean up memory. GC tuning is an art form, and what works for one person may not work for you.

Using the ConcurrentMarkSweep (CMS) collector with tuning parameters is a very good option for for Solr, but with the latest Java 7 releases (7u72 at the time of this writing), G1 is looking like a better option, if the -XX:+ParallelRefProcEnabled option is used. Information from Oracle engineers who specialize in GC indicates that the latest Java 8 will noticeably improve G1 performance over Java 7, but that has not been confirmed. Here are some ideas that hopefully you will find helpful:

  • The "MaxNewSize" should not be low, because the applications use caches setting it low value will cause the temporary cache data to me moved to Old Generation prematurely / so quickly. Once the objects are moved to Old gen then only during the complete GC face they will get cleared and till that time they will be present in the heap space. We should set the "MaxNewSize" (young generation heap size) to atleaset 1/6 (recommended) or 1/8 of the MaxHeap in genaral. If our application creates much more temporary objects (short lived) cached then the MaxNewSize can be further increased. Example : -Xmx8192m –Xms8192m –XX:MaxNewSize=1365m
  • Because normally the throughput Collector starts a GC cycle only when the heap is full (or reaches max), In order to finish a GC cycle before the application runs out of memory (or max memory), the CMS Collector needs to start a GC cycle much earlier than the throughput collector by setting -XX:CMSInitiatingOccupancyFraction=65 -XX:+UseCMSInitiatingOccupancyOnly
  • this will help in reducing the long GC pause. Because it will help the JVM to more proactively clean the heap when it reaches to 65% instead of waiting for it to be filled 90% and above.

Zookeeper –

As we know Solr uses Zookeeper to manage configs and co-ordination. Solr doesn’t use zookeeper that intensively when compared to other services(Kafka, services HA..). Since SolrCloud relies on Zookeeper, it can be very unstable if you have underlying performance issues that result in operations taking longer than the zkClientTimeout. Increasing that timeout can help, but addressing the underlying performance issues will yield better results. The default timeout 30 sec should be more than enough for a well-tuned SolrCloud. As we always strongly recommend storing the Zookeeper data on separate physical disks form other services and OS. Having dedicated machines when we have multiple services using ZK is even better, but not a requirement

Availability -

Having multiple shards with replication helps to keep the solr collections available in most of the cases like nodes going down. By default most of the collection are created with 1 shard and 1 replica. We can use the following commands to split the shard or recreate the collection with multiple shards.

Example Ranger Audit log, we can split the existing shard or recreate the collection. If its a new install/initial stages I would delete and recreate the collection.

To delete ranger_audits collection

http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=delete&name=ranger_audits

If you don’t have Solr UI enable or access you can use spnego principal and run the below command from command-line

curl -i --negotiate -u : “http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=delete&name=ranger_audits"

create new ranger_audits

http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=create&name=ranger_audits& numShards=3&replicationFactor=2&collection.configName=ranger_audits

Or from command line

curl -i --negotiate -u : "http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=create&name=ranger_audits& numShards=3&replicationFactor=2&collection.configName=ranger_audits"

You can also provide solr nodes where your shards can land in

http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=create&name=ranger_audits& numShard=3&replicationFactor=2&collection.configName=ranger_audits&createNodeSet=xhadambum1p.hortonworks.com:...

NOTE: Since we are using same collection.configName we don’t need to provide configs again for collection.

Split Shard

The below command split the shard1 into 2 shards shard1_0 and shard1_1

http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?collection=ranger_audit&shard=sha...

Disk Space

Some time having high expiration for documents can fill up the disk space in case of heavy traffic. So configuring the right TTL can eliminate this kind of disk space alerts. Example by default ranger_audits have 90days ttl this can be changed if needed.

If you haven't used Solr Audits before and haven't enabled Ranger Audits to Solr via Ambari yet, it will be easy to adjust the TTL configuration. By default ranger has its solrconfig.xml in /usr/hdp/2.5.0.0-1245/ranger-admin/contrib/solr_for_audit_setup/conf/solrconfig.xml

So you can directly edit the solrconfig.xml file and change +90days to the other number.

-->

<updateRequestProcessorChain name="add-unknown-fields-to-the-schema">

<processor>

<str name="fieldName">_ttl_</str>

<str name="value">+60DAYS</str>

</processor>

<processor>

<int name="autoDeletePeriodSeconds">86400</int>

<str name="ttlFieldName">_ttl_</str>

<str name="expirationFieldName">_expire_at_</str>

</processor>

<processor>

<str name="fieldName">_expire_at_</str>

</processor>

Afterwards, you can go to Ambari and enable Ranger Solr Audits, the collection that is going to be created will use the new setting.

If you already configured Ranger audits to Solr

Go to one of the Ambari Infra nodes that hosts a Solr Instance. You can download the solrconfig.xml or change the existing one of the component you have

To download

/usr/lib/ambari-infra-solr/server/scripts/cloud-scripts/zkcli.sh -cmd getfile /infra-solr/configs/ranger_audits/solrconfig.xml solrconfig.xml -z vb-atlas-ambari.hortonworks.com:2181

Edit the downloaded solrconfig.xml and change the ttl

Upload the config back to Zookeeper

/usr/lib/ambari-infra-solr/server/scripts/cloud-scripts/zkcli.sh -cmd putfile /infra-solr/configs/ranger_audits/solrconfig.xml solrconfig.xml -z vb-atlas-ambari.hortonworks.com:2181

Reload the config

http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=RELOAD&name=ranger_audits

Or form command line

curl -v --negotiate -u : "http://vb-atlas-ambari.hortonworks.com:8886/solr/admin/collections?action=RELOAD&name=ranger_audits

Example of doc after changing ttl from +90DAYS to +60DAYS you can verify

curl -i --negotiate -u : "http://vb-atlas-ambari.hortonworks.com:8886/solr/ranger_audits_shard1_replica1/select?q=_ttl_%3A%22%..." or from solr query UI have q as _ttl_:"+60DAYS"

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px Courier; -webkit-text-stroke: #000000} span.s1 {font-kerning: none}

{

"responseHeader":{

"status":0,

"QTime":6,

"params":{

"q":"_ttl_:\"+60DAYS\"\n",

"indent":"true",

"wt":"json"}},

"response":{"numFound":38848,"start":0,"docs":[

{

"id":"004fa587-c531-429a-89a6-acf947d93c39-70574",

"access":"WRITE",

"enforcer":"hadoop-acl",

"repo":"vinodatlas_hadoop",

"reqUser":"spark",

"resource":"/spark-history/.133f95bb-655f-450f-8aea-b87288ee2748",

"cliIP":"172.26.92.153",

"logType":"RangerAudit",

"result":1,

"policy":-1,

"repoType":1,

"resType":"path",

"reason":"/spark-history",

"action":"write",

"evtTime":"2017-02-08T23:08:08.103Z",

"seq_num":105380,

"event_count":1,

"event_dur_ms":0,

"_ttl_":"+60DAYS",

"_expire_at_":"2017-04-09T23:08:09.406Z",

"_version_":1558808142185234432},

{

11,185 Views
Comments
avatar
Expert Contributor

If you use the $ZK_HOST defined in infra-solr-env.sh you should not need to include the /infra-solr prefix when getting the solrconfig.xml:

source /etc/ambari-infra-solr/conf/infra-solr-env.sh
/usr/lib/ambari-infra-solr/server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST \
-cmd getfile /configs/ranger_audits/solrconfig.xml solrconfig.xml  

 The same applies when uploading the edited config.