Support Questions

Find answers, ask questions, and share your expertise

Solr vs SolrCloud

avatar
Super Collaborator

Hi All,

I am bit confused with Solr and Solrcloud

Why would anyone use Solr and not Solrcloud, as Solrcloud gives it all the HA functionality

Also as it is recommended to run Solr clusters separately from HDP clusters, does this same applies to Solrcloud also.

SolrCloud works with HDFS and not Solr, if we run it in separate cluster then is there any use of using HDFS?

Thanks,

Avijeet

1 ACCEPTED SOLUTION

avatar
Contributor

Good question, Avijeet, but I think there is a little fundamental confusion to start. Solr and SolrCloud are not separate things; Solr is the application while SolrCloud is a mode of running Solr. The alternative to running Solr in SolrCloud mode is running it in standalone mode.

SolrCloud mode offers index replication, failover, load balancing, and distributed queries with the help of ZooKeeper and other specialized features in Solr. In standalone mode, Solr still offers index replication and distributed queries in a master/slave model, but these activities are not coordinated with ZooKeeper but are managed manually. Failover and load balancing also need to be configured and managed entirely outside Solr with 3rd party tools.

When using Solr in SolrCloud mode, every index update is distributed across the cluster to every shard and replica of the cluster. For some use cases, such as particularly high indexing, this is too heavyweight, and standalone mode is preferred. Others simply prefer to separate nodes used for indexing from nodes used for queries, which is only possible today with standalone mode. Still others started with Solr before SolrCloud was introduced and have not yet found a compelling reason to change.

Regarding the question about running Solr clusters separately, since a SolrCloud cluster is a Solr cluster, the recommendation would apply.

Regarding the last question, "SolrCloud works with HDFS and not Solr, if we run it in separate cluster then is there any use of using HDFS?", would you restate this question? Solr can store indexes in HDFS in both available modes of operation, which is perhaps the answer you're looking for. It's worth noting, though, that Solr has it's own model for replicating the indexes, and it does not "hand off", as it were, this functionality to HDFS. Even if you store your indexes in HDFS (in either mode), you still need to consider your Solr-based replication strategy as you cannot rely on HDFS to handle it for you.

Hope this helps.

View solution in original post

4 REPLIES 4

avatar
Contributor

Good question, Avijeet, but I think there is a little fundamental confusion to start. Solr and SolrCloud are not separate things; Solr is the application while SolrCloud is a mode of running Solr. The alternative to running Solr in SolrCloud mode is running it in standalone mode.

SolrCloud mode offers index replication, failover, load balancing, and distributed queries with the help of ZooKeeper and other specialized features in Solr. In standalone mode, Solr still offers index replication and distributed queries in a master/slave model, but these activities are not coordinated with ZooKeeper but are managed manually. Failover and load balancing also need to be configured and managed entirely outside Solr with 3rd party tools.

When using Solr in SolrCloud mode, every index update is distributed across the cluster to every shard and replica of the cluster. For some use cases, such as particularly high indexing, this is too heavyweight, and standalone mode is preferred. Others simply prefer to separate nodes used for indexing from nodes used for queries, which is only possible today with standalone mode. Still others started with Solr before SolrCloud was introduced and have not yet found a compelling reason to change.

Regarding the question about running Solr clusters separately, since a SolrCloud cluster is a Solr cluster, the recommendation would apply.

Regarding the last question, "SolrCloud works with HDFS and not Solr, if we run it in separate cluster then is there any use of using HDFS?", would you restate this question? Solr can store indexes in HDFS in both available modes of operation, which is perhaps the answer you're looking for. It's worth noting, though, that Solr has it's own model for replicating the indexes, and it does not "hand off", as it were, this functionality to HDFS. Even if you store your indexes in HDFS (in either mode), you still need to consider your Solr-based replication strategy as you cannot rely on HDFS to handle it for you.

Hope this helps.

avatar
Super Collaborator

Thanks @Cassandra Targett, so is it right to say, there are 3 modes

1. Solr standalone mode

2. master/slave

3. SolrCloud

I assume the configuration is different for each type of setup.

avatar
Contributor

When speaking about scaling capabilities of Solr, there are really only two modes:

1. Master/slave, which is one or more Solr servers running in standalone mode with what the community calls "legacy scaling" features enabled.

2. SolrCloud, which is one or more Solr servers running with ZooKeeper coordinating activity between them.

Solr doesn't make the two really obvious, meaning there's no single switch to turn on or off to choose which mode. Both are a set of configuration options that are different depending on the mode you want to use.

It's important to note master/slave is not really different from standalone mode because you can work for years with Solr running on a single server and then add index replication (and/or distributed queries) if you need it. It's harder to move from standalone to SolrCloud (although not impossible).

Some documentation from the Solr Reference Guide might help:

Master/Slave: https://cwiki.apache.org/confluence/display/solr/Legacy+Scaling+and+Distribution

SolrCloud: https://cwiki.apache.org/confluence/display/solr/SolrCloud

avatar
New Contributor

Does streaming expressions in solr8.x work in a master/slave setup outside of Solr cloud?