SOLR version - 5.5.2
My Project requirement is to transfer solr cloud indexes from cloudera cluster to HDP cluster.
- Data is huge(1 billion indexed records on production), hence re-indexing is not an option.
We have tried solr restore and backup APIs but data is not visible on cloud. Please check if we are missing any step from below ==>
1) Allowed snapshot (Cloudera cluster) :
sudo -u hdfs hadoop dfsadmin -allowSnapshot /user/solr/CollectionName
2) Created snapshot :
sudo -u hdfs hadoop dfs -createSnapshot /user/solr/CollectionName/
3) Created solr collection on HDP cluster : with same name, same number of shards and replicas.
4) Used “distcp” to transfer snapshot :
sudo -u solr hadoop distcp hdfs://NameNodeCDH-IP:8020/user/solr/CDHCollectionName/.snapshot/s20180601-131020.000 hdfs://NameNodeHDP-IP:8020/user/solr
5) Restore snapshot on collection level :
sudo -u solr hadoop fs -cp /user/solr/s20180601-131020.000/* /user/solr/HDPCollectionName/ Restored snapshot from /user/solr to collection directory for each shard and replica.
OUTCOME : HDFS directory restored but DATA not visible on SOLR UI. 0 records displayed. Checked HDFS directory using-
sudo hadoop fs -du -s -h /user/solr/HDPCollectionName/

SOLR Cloud UI -- 0 indexes

Are we missing something here?