Created 12-02-2015 03:28 PM
Hortonworks has a tutorial that shows how to configure Solr to store index files in HDFS. Since HDFS is already a fault tolerant file system, does it mean that with this approach we can keep the replication factor of 1 for any collections (shards) that we create? It sounds like a lot of redundancy if we keep the default HDFS replication factor of 3 plus Solr replication on top of that.
Created 12-02-2015 03:41 PM
@Jeremy Dyer Solr submits all its files to HDFS with the replication factor set to 1, meaning all Solr index and data files are stored with an HDFS replication of 1. Therefore, the Solr replication should be used. I am not sure if there is a Solr configuration parameter to set the HDFS replication for Solr files.
Does that help?
Created 12-02-2015 03:30 PM
This is good read and may help to decide the serving layer. If you are storing data and index on HDFS then I will go with 1.
Created 12-02-2015 03:42 PM
Thanks for adding this, this is a good source. We covered a lot of replication and SolrCloud topics in there 🙂
Created 12-02-2015 03:41 PM
@Jeremy Dyer Solr submits all its files to HDFS with the replication factor set to 1, meaning all Solr index and data files are stored with an HDFS replication of 1. Therefore, the Solr replication should be used. I am not sure if there is a Solr configuration parameter to set the HDFS replication for Solr files.
Does that help?
Created 12-02-2015 03:51 PM
Update regarding the HDFS Replication configuration for solr files, there is an open Jira for this SOLR-6305 ("Ability to set the replication factor for index files created by HDFSDirectoryFactory")