Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (2)

Problem:

Time to live for collection data was never set on Solr Cloud server and caused the disk and nodes to fill up with too many documents due to ranger audit.

Solution:

1. Delete the collection through Solr api because the ranger archive was stored in HDFS anyways. (The following cleared up disk space)

http://<solr_host>:<solr_port>/solr/admin/collections?action=DELETE&name=collection

2. Set the configuration in solr to have a time to live.

a. Download each of the following configs from zookeeper: schema.xml, solrconfig.xml, and managed-schema.xml

/opt/hostname-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost <zookeeper host>:<zookeeper port> -cmd get /ranger_audits/configs/ranger_audits/solrconfig.xml >/tmp/solrconfig.xml
/opt/hostname-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost <zookeeper host>:<zookeeper port> -cmd get /ranger_audits/configs/ranger_audits/schema.xml >/tmp/schema.xml
/opt/hostname-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost <zookeeper host>:<zookeeper port> -cmd get /ranger_audits/configs/ranger_audits/managed-schema >/tmp/managed-schema

b. Added the following to solrconfig

<updateRequestProcessorChain name="add-unknown-fields-to-the-schema"> 
<processor class="solr.DefaultValueUpdateProcessorFactory">

        <str name="fieldName">_ttl_</str>

        <str name="value">+90DAYS</str>

    </processor>

    <processor class="solr.processor.DocExpirationUpdateProcessorFactory">

        <int name="autoDeletePeriodSeconds">86400</int>

        <str name="ttlFieldName">_ttl_</str>

        <str name="expirationFieldName">_expire_at_</str>

    </processor>

    <processor class="solr.FirstFieldValueUpdateProcessorFactory">

      <str name="fieldName">_expire_at_</str>

    </processor>

c. Added the following to schema.xml and managed-schema.xm

<field name="_expire_at_" type="tdate" multiValued="false" stored="true" docValues="true"/>
<field name="_ttl_" type="string" multiValued="false" indexed="true" stored="true"/>

d. Uploaded each edited file to zookeeper.

/opt/hostname-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost <zookeeper host>:<zookeeper port> -cmd putfile ranger_audits/configs/ranger_audits/solrconfig.xml /tmp/solrconfig.xml
/opt/hostname-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost <zookeeper host>:<zookeeper port> -cmd putfile ranger_audits/configs/ranger_audits/schema.xml /tmp/schema.xml
/opt/hostname-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost <zookeeper host>:<zookeeper port> -cmd putfile ranger_audits/configs/ranger_audits/managed-schema /tmp/managed-schema

3.Make sure on each node the ranger_audit replicas were removed from the solr directories in the local filesystem.

4. Lastly we issue the create command

http://<host>:8983/solr/admin/collections?action=CREATE&name=ranger_audits&collection.configName=ran...

MORE INFO:

Check out more Solr General info and Ambari-infra TTL info here: https://community.hortonworks.com/articles/63853/solr-ttl-auto-purging-solr-documents-ranger-audits....

496 Views
0 Kudos
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎02-17-2020 09:03 AM
Updated by:
 
Top Kudoed Authors