Created on 03-19-2019 03:28 PM - edited 09-16-2022 07:14 AM
I'm running CDH6.1 with HDFS underneath Solrcloud with a number of collections. I'd like to create indexes on one cluster and move them using hadoop distcp to another cluster once the data ingest is complete for the collection.
An issue I've run into is that when creating collections via either 'solrctl collection --create' or via the API 'admin/collections?action=CREATE', the replicas aren't always named predictably. For example, on a 6 shard collection with replicationFactor of 1, I've seen anything from....
core_node3, core_node5, core_node7, core_node9, core_node11, core_node12
...to....
core_node362, core_node363, core_node364, core_node365, core_node366, core_node367
Since these names end up being used in the HDFS based dataDir/ulogDir values for each replica, it means I have to do a bunch of HDFS renaming to get things to line up with the target cluster's collection's replica names.
I've recently started using createNodeSet=EMPTY in the collections/CREATE API, and then calling collections/ADDREPLICA API to create my own replicas with the predictable dataDir and ulogDir. That mostly solves it. But the replica names are still this unpredictable value as showing in CLUSTERSTATUS, and now they're no longer related ot the HDFS dataDir/ulogDir.
Is there some parameter I'm missing during ADDREPLICA that allows me to assign the replica name?
Created 03-20-2019 09:30 AM
Created 03-19-2019 04:10 PM
Created 03-20-2019 06:32 AM
If I understand correctly, I have 2 choices with the backup portion of the suggested approach:
So far, so good.
What I'm not understanding is how the named snapshot (made on ingest cluster) becomes known by the search cluster so that the restore, which needs the snapshot name as the -b option, can work. Is it possible to restore on a different cluster that has no knowledge of the original snapshot?
Additionally, the backup/restore approach seems like it might require 2x the amount of writes on the search cluster compared to just distcp'ing the data from one cluster to the other and pointing new replicas at that data.
What I'm assuming happens on the search cluster is:
Is this an incorrect understanding of how the backup/restore would work?
Created 03-20-2019 09:30 AM