Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)
avatar

Problem:

Time to live for collection data was never set on Solr Cloud server and caused the disk and nodes to fill up with too many documents due to ranger audit.

Solution:

1. Delete the collection through Solr api because the ranger archive was stored in HDFS anyways. (The following cleared up disk space)

http://<solr_host>:<solr_port>/solr/admin/collections?action=DELETE&name=collection

2. Set the configuration in solr to have a time to live.

a. Download each of the following configs from zookeeper: schema.xml, solrconfig.xml, and managed-schema.xml

/opt/hostname-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost <zookeeper host>:<zookeeper port> -cmd get /ranger_audits/configs/ranger_audits/solrconfig.xml >/tmp/solrconfig.xml
/opt/hostname-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost <zookeeper host>:<zookeeper port> -cmd get /ranger_audits/configs/ranger_audits/schema.xml >/tmp/schema.xml
/opt/hostname-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost <zookeeper host>:<zookeeper port> -cmd get /ranger_audits/configs/ranger_audits/managed-schema >/tmp/managed-schema

b. Added the following to solrconfig

<updateRequestProcessorChain name="add-unknown-fields-to-the-schema"> 
<processor class="solr.DefaultValueUpdateProcessorFactory">

        <str name="fieldName">_ttl_</str>

        <str name="value">+90DAYS</str>

    </processor>

    <processor class="solr.processor.DocExpirationUpdateProcessorFactory">

        <int name="autoDeletePeriodSeconds">86400</int>

        <str name="ttlFieldName">_ttl_</str>

        <str name="expirationFieldName">_expire_at_</str>

    </processor>

    <processor class="solr.FirstFieldValueUpdateProcessorFactory">

      <str name="fieldName">_expire_at_</str>

    </processor>

c. Added the following to schema.xml and managed-schema.xm

<field name="_expire_at_" type="tdate" multiValued="false" stored="true" docValues="true"/>
<field name="_ttl_" type="string" multiValued="false" indexed="true" stored="true"/>

d. Uploaded each edited file to zookeeper.

/opt/hostname-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost <zookeeper host>:<zookeeper port> -cmd putfile ranger_audits/configs/ranger_audits/solrconfig.xml /tmp/solrconfig.xml
/opt/hostname-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost <zookeeper host>:<zookeeper port> -cmd putfile ranger_audits/configs/ranger_audits/schema.xml /tmp/schema.xml
/opt/hostname-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost <zookeeper host>:<zookeeper port> -cmd putfile ranger_audits/configs/ranger_audits/managed-schema /tmp/managed-schema

3.Make sure on each node the ranger_audit replicas were removed from the solr directories in the local filesystem.

4. Lastly we issue the create command

http://<host>:8983/solr/admin/collections?action=CREATE&name=ranger_audits&collection.configName=ran...

MORE INFO:

Check out more Solr General info and Ambari-infra TTL info here: https://community.hortonworks.com/articles/63853/solr-ttl-auto-purging-solr-documents-ranger-audits....

1,793 Views
0 Kudos