Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Solr architecture for a production environment

Solved Go to solution
Highlighted

Solr architecture for a production environment

Contributor

We need to deploy Solr 5.2.1 on HDP 2.3.2 on a production environment (3 master nodes with HA on HDFS, YARN and Hive, 13 worker nodes, 2 edge, 2 support and 2 security). Is there a "best practice" for production? This is a multi-purpose cluster in which Hive, Pig, HOYA and Spark jobs are currently running.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Solr architecture for a production environment

For high throughput use cases, Solr (actually Solr in Cloud mode) should run on separate nodes. However for HDFS based indexes you may get slight performance degradation. You can colocate Solr with the Datanodes but you sacrifice latency. So since you are running Spark jobs also, I would recommend SolrCloud on a couple more nodes

View solution in original post

5 REPLIES 5
Highlighted

Re: Solr architecture for a production environment

For high throughput use cases, Solr (actually Solr in Cloud mode) should run on separate nodes. However for HDFS based indexes you may get slight performance degradation. You can colocate Solr with the Datanodes but you sacrifice latency. So since you are running Spark jobs also, I would recommend SolrCloud on a couple more nodes

View solution in original post

Highlighted

Re: Solr architecture for a production environment

+1 to @Ancil McBarnett . I would add depending on how you will be accessing Solr, you may want a load balancer in front of your cloud. Any of the Solr instances, shard or replica, can service requests on the SolrCloud.

Highlighted

Re: Solr architecture for a production environment

Contributor

@Ancil McBarnett Thanks! We need to keep indexes on HDFS but we need also to index files (about 500.000) on HDFS (PDF, EML and P7F). Following your suggestion could we deploy Solr on all DataNodes and also on two master nodes?

@azeltov So is it correct to say that any Solr could service request on HTTP port 8983 (both Solr and Banana)? Do you have some suggestion about the load balancer? Thanks a lot!

Highlighted

Re: Solr architecture for a production environment

@Andrea D'Orio You can point an F5 to all or any of the SOLR nodes. SOLR cloud is smart enough in distributing queries to the right shards and replicas. Round robin should be fine. Also, if you're using HDFS to store the indexes than the SOLR needs to sit on the data nodes or nodes with the HDFS client.

https://doc.lucidworks.com/lucidworks-hdpsearch/2.3/Guide-Install.html

Re: Solr architecture for a production environment

Contributor

If you are using SolrJ from your client, then it will connect to zookeeper and automatically do the load balancing for you. If you are going to use SolrJ, then make sure use CloudSolrClient class

Don't have an account?
Coming from Hortonworks? Activate your account here