Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

SolrCloud Replication factor with index files in HDFS

Solved Go to solution
Highlighted

SolrCloud Replication factor with index files in HDFS

Guru

Hortonworks has a tutorial that shows how to configure Solr to store index files in HDFS. Since HDFS is already a fault tolerant file system, does it mean that with this approach we can keep the replication factor of 1 for any collections (shards) that we create? It sounds like a lot of redundancy if we keep the default HDFS replication factor of 3 plus Solr replication on top of that.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: SolrCloud Replication factor with index files in HDFS

@Jeremy Dyer Solr submits all its files to HDFS with the replication factor set to 1, meaning all Solr index and data files are stored with an HDFS replication of 1. Therefore, the Solr replication should be used. I am not sure if there is a Solr configuration parameter to set the HDFS replication for Solr files.

Does that help?

View solution in original post

4 REPLIES 4
Highlighted

Re: SolrCloud Replication factor with index files in HDFS

@Jeremy Dyer

This is good read and may help to decide the serving layer. If you are storing data and index on HDFS then I will go with 1.

https://community.hortonworks.com/questions/4858/solrcloud-performance-hdfs-indexdata.html#answer-48...

Highlighted

Re: SolrCloud Replication factor with index files in HDFS

Thanks for adding this, this is a good source. We covered a lot of replication and SolrCloud topics in there :)

Highlighted

Re: SolrCloud Replication factor with index files in HDFS

@Jeremy Dyer Solr submits all its files to HDFS with the replication factor set to 1, meaning all Solr index and data files are stored with an HDFS replication of 1. Therefore, the Solr replication should be used. I am not sure if there is a Solr configuration parameter to set the HDFS replication for Solr files.

Does that help?

View solution in original post

Highlighted

Re: SolrCloud Replication factor with index files in HDFS

Update regarding the HDFS Replication configuration for solr files, there is an open Jira for this SOLR-6305 ("Ability to set the replication factor for index files created by HDFSDirectoryFactory")

Don't have an account?
Coming from Hortonworks? Activate your account here