Support Questions

Find answers, ask questions, and share your expertise

Solr disaster recovery at CDH 5.4.8

avatar
Expert Contributor

 We have CDH5.4.8 PRO Cluster and we have setup CDH5.4.8 DR machines for Disaster Recovery.Now we want solr instance at the both cluster need to sync on Index and collection inorder to  CDH 5.4.8 DR machines provide service as like CDH5.4.8 PRO machine on DownTime.

 

We like to know answer for the below questions?

 

1. Simply copying the PRO machine index and collection folder of hdfs to DR Cluster. will it work?

2. Is it any possibility there to make both CDH 5.4.8 and CDH 5.4.8 DR machine always sync on index and collection.

3. What is the recommeded way to take backup of PRO solr indexes and collection to DR Cluster.

2 ACCEPTED SOLUTIONS

avatar
1. Simply copying the PRO machine index and collection folder of hdfs to DR Cluster. will it work?

This will not work unfortunately. the solr index and tlog files are in a constant state of being updating, and there is no way to ensure a consistent snapshot while solr is running. This could be done if solr was shut down, however, the core_node directories that exist under the /solr/<collection_name> in hdfs are mapped to specific shards/replicas, and you would have to ensure that when creating the corresponding collection in DR, that you map the core_node directories to the same shards/replicas at collection creation time.

2. Is it any possibility there to make both CDH 5.4.8 and CDH 5.4.8 DR machine always sync on index and collection.

Prior to CDH 5.9, the best way to do this is to have your indexing jobs publish documents to both collections. As of CDH5.9, there is the ability to backup and restore collections, either locally or in DR: https://www.cloudera.com/documentation/enterprise/5-9-x/topics/search_backup_restore.html

3. What is the recommeded way to take backup of PRO solr indexes and collection to DR Cluster.
If you can't upgrade to CDH5.9, then the recommended way to backup the solr indexes is to stop the solr service and do an hdfs snapshot or distcp to copy the indexes to a backup location. For the backup location, if you need to run the same collection there, you would need to create it with the createNodeSet property for Solr 4.10.3 to ensure the collection gets created on the proper nodes, and you'd have to verify that the core_noden directories map to the same shards in the clusterstate.json as whats in production.

-pd

View solution in original post

avatar
Expert Contributor
2 REPLIES 2

avatar
1. Simply copying the PRO machine index and collection folder of hdfs to DR Cluster. will it work?

This will not work unfortunately. the solr index and tlog files are in a constant state of being updating, and there is no way to ensure a consistent snapshot while solr is running. This could be done if solr was shut down, however, the core_node directories that exist under the /solr/<collection_name> in hdfs are mapped to specific shards/replicas, and you would have to ensure that when creating the corresponding collection in DR, that you map the core_node directories to the same shards/replicas at collection creation time.

2. Is it any possibility there to make both CDH 5.4.8 and CDH 5.4.8 DR machine always sync on index and collection.

Prior to CDH 5.9, the best way to do this is to have your indexing jobs publish documents to both collections. As of CDH5.9, there is the ability to backup and restore collections, either locally or in DR: https://www.cloudera.com/documentation/enterprise/5-9-x/topics/search_backup_restore.html

3. What is the recommeded way to take backup of PRO solr indexes and collection to DR Cluster.
If you can't upgrade to CDH5.9, then the recommended way to backup the solr indexes is to stop the solr service and do an hdfs snapshot or distcp to copy the indexes to a backup location. For the backup location, if you need to run the same collection there, you would need to create it with the createNodeSet property for Solr 4.10.3 to ensure the collection gets created on the proper nodes, and you'd have to verify that the core_noden directories map to the same shards in the clusterstate.json as whats in production.

-pd

avatar
Expert Contributor