Support Questions

Find answers, ask questions, and share your expertise

SolrCloud Replication factor with index files in HDFS

avatar
Guru

Hortonworks has a tutorial that shows how to configure Solr to store index files in HDFS. Since HDFS is already a fault tolerant file system, does it mean that with this approach we can keep the replication factor of 1 for any collections (shards) that we create? It sounds like a lot of redundancy if we keep the default HDFS replication factor of 3 plus Solr replication on top of that.

1 ACCEPTED SOLUTION

avatar

@Jeremy Dyer Solr submits all its files to HDFS with the replication factor set to 1, meaning all Solr index and data files are stored with an HDFS replication of 1. Therefore, the Solr replication should be used. I am not sure if there is a Solr configuration parameter to set the HDFS replication for Solr files.

Does that help?

View solution in original post

4 REPLIES 4

avatar
Master Mentor
@Jeremy Dyer

This is good read and may help to decide the serving layer. If you are storing data and index on HDFS then I will go with 1.

https://community.hortonworks.com/questions/4858/solrcloud-performance-hdfs-indexdata.html#answer-48...

avatar

Thanks for adding this, this is a good source. We covered a lot of replication and SolrCloud topics in there 🙂

avatar

@Jeremy Dyer Solr submits all its files to HDFS with the replication factor set to 1, meaning all Solr index and data files are stored with an HDFS replication of 1. Therefore, the Solr replication should be used. I am not sure if there is a Solr configuration parameter to set the HDFS replication for Solr files.

Does that help?

avatar

Update regarding the HDFS Replication configuration for solr files, there is an open Jira for this SOLR-6305 ("Ability to set the replication factor for index files created by HDFSDirectoryFactory")