Support Questions

dorio · ‎12-10-2015

We need to deploy Solr 5.2.1 on HDP 2.3.2 on a production environment (3 master nodes with HA on HDFS, YARN and Hive, 13 worker nodes, 2 edge, 2 support and 2 security). Is there a "best practice" for production? This is a multi-purpose cluster in which Hive, Pig, HOYA and Spark jobs are currently running.

amcbarnett · ‎12-11-2015

For high throughput use cases, Solr (actually Solr in Cloud mode) should run on separate nodes. However for HDFS based indexes you may get slight performance degradation. You can colocate Solr with the Datanodes but you sacrifice latency. So since you are running Spark jobs also, I would recommend SolrCloud on a couple more nodes

View solution in original post

amcbarnett · ‎12-11-2015

For high throughput use cases, Solr (actually Solr in Cloud mode) should run on separate nodes. However for HDFS based indexes you may get slight performance degradation. You can colocate Solr with the Datanodes but you sacrifice latency. So since you are running Spark jobs also, I would recommend SolrCloud on a couple more nodes

azeltov · ‎12-11-2015

+1 to @Ancil McBarnett . I would add depending on how you will be accessing Solr, you may want a load balancer in front of your cloud. Any of the Solr instances, shard or replica, can service requests on the SolrCloud.

dorio · ‎12-11-2015

@Ancil McBarnett Thanks! We need to keep indexes on HDFS but we need also to index files (about 500.000) on HDFS (PDF, EML and P7F). Following your suggestion could we deploy Solr on all DataNodes and also on two master nodes?

@azeltov So is it correct to say that any Solr could service request on HTTP port 8983 (both Solr and Banana)? Do you have some suggestion about the load balancer? Thanks a lot!

ccasano · ‎12-30-2015

@Andrea D'Orio You can point an F5 to all or any of the SOLR nodes. SOLR cloud is smart enough in distributing queries to the right shards and replicas. Round robin should be fine. Also, if you're using HDFS to store the indexes than the SOLR needs to sit on the data nodes or nodes with the HDFS client.

https://doc.lucidworks.com/lucidworks-hdpsearch/2.3/Guide-Install.html

bdurai · ‎01-05-2016

If you are using SolrJ from your client, then it will connect to zookeeper and automatically do the load balancing for you. If you are going to use SolrJ, then make sure use CloudSolrClient class

Cloudera Community

Support Questions

Solr architecture for a production environment