We have installed hdfs and solrcloud in the same 3 servers, which has 64GB. We found the cache being used much more when the solr search works. And the worst thing is when the cache is alomost as much as free momory, the solrcloud crashed. We guess the hdfs is using the off-heap memory and that the reason that cached memory is so much. However what we are trying to find is why solr always crash when the JVM is not full. Anyone met this kind of issue?
@ao david - Please provide more details such as Solr log, which JVM you're using, JVM settings, Solr block cache settings (assuming your index is in HDFS), and what do you mean Solr crashes, etc. Is Solr the only thing running in your HDFS cluster?
It sounds like you're talking about OS cache and OS free memory. Solr does not use OS cache when using HDFS, but instead caches HDFS blocks in the JVM since the data is not necessarily co-located with the Solr server. It can be in on-heap or off-heap, but is all within the JVM.
thanks james.jones, in solr log, OOM issue yeild, we use 8G for JVM(xmx=xms) and solr index is in HDFS, except solr in the cluster we also run hbase on same cluster.
when solr search running, the OS cache raises which showed in ambari memory usage UI or Linux command `free -m`
at the same time, solr admin web UI shows physical memory is raising too, up to 99%, then OOM yeild and crash.
i was confused for a long time, it is the problem of memory size? we add memory from 16G to 64G, but it does not work, we index 30G data about 6.5 million docs. 3 nodes and 2 shards for the solrcloud.
@ao david - I have not worked with Solr on HDFS much, but I know you have to allocate a lot more memory and you need to configure the memory correctly in JVM startup and possibly solrconfig.xml. With local storage, the operating system caches files but with hdfs storage, files are cached in Java memory. By default, that's done in DirectMemory and that memory has to be allocated -XX:MaxDirectMemorySize=20g. Take a look at this page: https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
If you are doing frequent updates, you probably should be using local storage rather than HDFS.
I don't know, but HDFS may have nothing to do with it, you said that queries is running it out of memory. You might want to look at reducing cache sizes in SolrConfig.xml. A lot of factors can affect memory usage, but your problem may be related to the queries. It can depend a lot on your index size, how you are indexing fields, tokenization, document or field size, number of shards, etc. Using lots of facets consumes memory or complex facets. If the index is very large and you are using wildcards and regular expressions or if you are doing updates at the same time.