Support Questions

Find answers, ask questions, and share your expertise

Solr jvm heap recommendation

avatar
Contributor

My Solr instance is getting killed once in a week with OOM.

I tried to tune the below parameter.. Need some recommendations..

Recommended values for these parameters based on the log provided:

GC_TUNE="-XX:NewRatio=3 \
-XX:SurvivorRatio=4 \
-XX:TargetSurvivorRatio=90 \
-XX:MaxTenuringThreshold=8 \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \
-XX:+CMSScavengeBeforeRemark \
-XX:PretenureSizeThreshold=64m \
-XX:+UseCMSInitiatingOccupancyOnly \
-XX:CMSInitiatingOccupancyFraction=50 \
-XX:CMSMaxAbortablePrecleanTime=6000 \
-XX:+CMSParallelRemarkEnabled \
-XX:+ParallelRefProcEnabled"

And also:

Minimum Heap Size & Maximum Heap Size

Log:

Heap after GC invocations=19 (full 1):
 par new generation   total 218496K, used 7215K [0x00000006c0000000, 0x00000006d0000000, 0x0000000700000000)
  eden space 174848K,   0% used [0x00000006c0000000, 0x00000006c0000000, 0x00000006caac0000)
  from space 43648K,  16% used [0x00000006cd560000, 0x00000006cdc6bc78, 0x00000006d0000000)
  to   space 43648K,   0% used [0x00000006caac0000, 0x00000006caac0000, 0x00000006cd560000)
 concurrent mark-sweep generation total 786432K, used 92764K [0x0000000700000000, 0x0000000730000000, 0x00000007c0000000)
 Metaspace       used 39260K, capacity 39748K, committed 40076K, reserved 1085440K
  class space    used 4267K, capacity 4422K, committed 4528K, reserved 1048576K
}
2017-11-21T14:30:19.916-0500: 117.112: Total time for which application threads were stopped: 0.0098877 seconds, Stopping threads took: 0.0000449 seconds
2017-11-21T14:30:20.916-0500: 118.112: Total time for which application threads were stopped: 0.0003058 seconds, Stopping threads took: 0.0001195 seconds
{Heap before GC invocations=19 (full 1):
 par new generation   total 218496K, used 182063K [0x00000006c0000000, 0x00000006d0000000, 0x0000000700000000)
  eden space 174848K, 100% used [0x00000006c0000000, 0x00000006caac0000, 0x00000006caac0000)
  from space 43648K,  16% used [0x00000006cd560000, 0x00000006cdc6bc78, 0x00000006d0000000)
  to   space 43648K,   0% used [0x00000006caac0000, 0x00000006caac0000, 0x00000006cd560000)
 concurrent mark-sweep generation total 786432K, used 92764K [0x0000000700000000, 0x0000000730000000, 0x00000007c0000000)
 Metaspace       used 39262K, capacity 39748K, committed 40076K, reserved 1085440K
  class space    used 4267K, capacity 4422K, committed 4528K, reserved 1048576K
2017-11-21T14:30:28.151-0500: 125.346: [GC (Allocation Failure) 2017-11-21T14:30:28.151-0500: 125.346: [ParNew
Desired survivor size 40225992 bytes, new threshold 8 (max 8)
- age   1:     136184 bytes,     136184 total
- age   2:    1010336 bytes,    1146520 total
- age   3:      40472 bytes,    1186992 total
- age   4:      73744 bytes,    1260736 total
- age   5:    2934424 bytes,    4195160 total
- age   6:      24424 bytes,    4219584 total
- age   7:     111680 bytes,    4331264 total
- age   8:      56160 bytes,    4387424 total
: 182063K->5404K(218496K), 0.0065729 secs] 274827K->98648K(1004928K), 0.0066500 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]



3 REPLIES 3

avatar
Rising Star

Is this infra-solr or a regular Solr instance? Assuming from the tags this is infra-solr. Are you putting Ranger audit logs in Infra Solr?

avatar
Contributor

Yes we are using infra-solr for ranger audits.

avatar
Rising Star

This was expanded to a blog post with additional details:

https://risdenk.github.io/2017/12/18/ambari-infra-solr-ranger.html

There are two things I would check:

1) Check the field caches by navigating to Ambari Infra Solr UI - http://HOSTNAME:8886/solr/#/ranger_audits_shard1_replica1/plugins/cache?entry=fieldCache,fieldValueC...

2) If you were to take a heap dump, you could open the heap dump in Eclipse Memory Analyzer and check this biggest heap offender.

My assumption is that the majority of the heap is being used by uninverting the _version_ field since it is being used for sorting or other things instead of indexing. This isn't a misuse by your part but a problem with how Ranger is using Solr. I was able to fix this by turning on DocValues for the _version_ field. I am currently working on opening a Ranger JIRA and posting to the Ranger user list to try to address this. The change to DocValues will require you to delete the collection and recreate. We have been running with 4GB heap with no issues. Previously we needed a heap of almost 20GB to handle the uninverted cached values (this grew overtime).