Support Questions
Find answers, ask questions, and share your expertise

HDP Search

Expert Contributor

Hi,

Is it recommended to have a separate HDP Search/SOLR cluster ? Why is it packaged outside of HDP (which has HIVE, HBASE and so many other components.)

Thanks,

Avijeet

1 ACCEPTED SOLUTION

Accepted Solutions

Expert Contributor

@Avijeet Dash

HDP Search is the basic Solr package with a tested integration to HDP. Lucidworks, who are the primary contributors to Solr, package the product.

The default storage option for Solr uses the server's local disk for storage. You can see that this would cause competition for disk resources if the Solr installation is co-located with an HDP datanode. If you go with the SolrCloud option you can configure HDFS as your Solr data repository. Aside from fault tolerance and high availability, this gives you the option of adding more datanodes to your HDP cluster to handle the expected increase in disk use by SolrCloud.

View solution in original post

3 REPLIES 3

Expert Contributor

@Avijeet Dash

HDP Search is the basic Solr package with a tested integration to HDP. Lucidworks, who are the primary contributors to Solr, package the product.

The default storage option for Solr uses the server's local disk for storage. You can see that this would cause competition for disk resources if the Solr installation is co-located with an HDP datanode. If you go with the SolrCloud option you can configure HDFS as your Solr data repository. Aside from fault tolerance and high availability, this gives you the option of adding more datanodes to your HDP cluster to handle the expected increase in disk use by SolrCloud.

View solution in original post

Super Collaborator

@Avijeet Dash - Terry made all good points. Note that using SolrCloud does not require using HDFS. SolrCloud can also use local storage and it is not uncommon. Sometimes people misunderstand when we don't point this out. The optimal choice of HDFS vs local depends on the use case, but local storage is usually preferred over HDFS if your index has a high level of updates/adds. SolrCloud automatically replicates your data and is fault tolerant, but still, SolrCloud has the advantages Terry mentioned.

Super Guru

@Avijeet Dash Solr competes for resources similar to hbase. It does not run on yarn unless you run it in slider. Controling these resources from CPU and ram perspective becomes challenges. Even if you could isolate ram and CPU, as Terry mentioned, then you sill have IO contention. I do not recommend running solr on hdfs based on my implementation experience. Have it use local fast disk (ie SSD) and let it run.