Created 03-17-2017 01:54 AM
I have a customer who is running SOLR 4.10.3. Is there a cross data center replication mechanism available for for this version? If not, what is the best practice to keep DR in sync.
Created 03-17-2017 02:21 PM
Cross Data Center Replication for Solr was released in Solr 6.x. It is not available in version 4.10.3.
http://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf
Take a look at page 409 that talks about using the ReplicationHandler for making backup copies of indexes. You can always use standard filesystem methods for performing backups, but it isn't as clean as CDCR in Solr 6.x.
Solr 5.x introduced the ability to backup and restore your indexes using the API. I would encourage customers to upgrade to at least Solr 5.x.
Created 03-17-2017 02:21 PM
Cross Data Center Replication for Solr was released in Solr 6.x. It is not available in version 4.10.3.
http://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf
Take a look at page 409 that talks about using the ReplicationHandler for making backup copies of indexes. You can always use standard filesystem methods for performing backups, but it isn't as clean as CDCR in Solr 6.x.
Solr 5.x introduced the ability to backup and restore your indexes using the API. I would encourage customers to upgrade to at least Solr 5.x.
Created 03-17-2017 09:37 PM
Thanks @Michael Young
I'll ask customer to look into ReplicationHandler. In addition to that, when you say "use standard filesystem methods", it means in this case HDFS Distcp because SOLR is running on top of HDFS. Is that right?
Created 03-17-2017 09:44 PM
If Solr is storing the indexes on HDFS, then you have a fairly easy way of doing backups.
You can use HDFS snapshots to take incremental backups of the Solr index directories on HDFS and then use distcp to copy those snapshots to another HDFS cluster. That provides the ability to have local backup copies and remote backup copies.
If you didn't want to perform the HDFS snapshots, you could simply use distcp to replicate the HDFS data to another cluster. However, you lose the easy ability to restore an HDFS snapshot from a local backup.