Community Articles

myoung · ‎11-01-2016

Objective

Cross Data Center Replication, commonly abbreviated as CDCR, is a new feature found in SolrCloud 6.x. This feature enables Solr to replicate data from one source collection to one or more target collections distributed between data centers. The current version provides an active-passive disaster recovery solution for Solr. Data updates, which include adds, updates, and deletes, are copied from the source collection to the target collection. This means the target collection should not be sent data updates outside of the CDRC functionality.

Prior to SolrCloud 6.x you had to manually design a strategy for replication across data centers. This tutorial will guide you through the process of enabling CDCR between two SolrCloud clusters, each with 1 server, in a Vagrant + VirtualBox environment.

NOTE: Solr 6 is being deployed as a standalone application. HDP 2.5 provides support for Solr 5.5.2 via HDPSearch which does not include CDCR functionality.

Prerequisites

You should have already installed the following:
VirtualBox 5.1.6 (VirtualBox)
Vagrant 1.8.6 (Vagrant)
Vagrant plugin vagrant-vbguest 0.13.x (vagrant-vbguest)
Vagrant plugin vagrant-hostmanager 1.8.5 ( vagrant-hostmanager)
You should have already downloaded the Apache Solr 6.2.1 release ( Apache Solr 6.2.1)

Scope

This tutorial was tested using the following environment and components:

Mac OS X 10.11.6 (El Capitan)
VirtualBox 5.1.6 (tutorial should work with any newer version)
Vagrant 1.8.6
vagrant-vbguest plugin 0.13.0
vagrant-hostnamanger plugin 1.8.5
Apache Solr 6.2.1

Steps

Create Vagrant project directory

I like to create project directories. My Vagrant work goes under ~/Vagrant/<project> and my Docker work goes under ~/Docker/<project>. This allows me to clearly identify which technology is associated with the projects and allows me to use various helper scripts to automate processes, etc. So let's create project directory for this tutorial.

mkdir -p ~/Vagrant/solrcloud-cdcr-tutorial && cd ~/Vagrant/solrcloud-cdcr-tutorial

Create Vagrantfile

The Vagrantfile tells Vagrant how to configure your virtual machines. You can copy/paste my Vagrantfile below or use the version in the attachments area of this tutorial. Here is the content from my file:

# -*- mode: ruby -*-
# vi: set ft=ruby :
# Using yaml to load external configuration files
require 'yaml'
Vagrant.configure(2) do |config|
  # Using the hostmanager vagrant plugin to update the host files
  config.hostmanager.enabled = true
  config.hostmanager.manage_host = true
  config.hostmanager.manage_guest = true
  config.hostmanager.ignore_private_ip = false
  # Loading in the list of commands that should be run when the VM is provisioned.
  commands = YAML.load_file('commands.yaml')
  commands.each do |command|
    config.vm.provision :shell, inline: command
  end
  # Loading in the VM configuration information
  servers = YAML.load_file('servers.yaml')
  servers.each do |servers|
    config.vm.define servers[name] do |srv|
      srv.vm.box = servers[box] # Speciy the name of the Vagrant box file to use
      srv.vm.hostname = servers[name] # Set the hostname of the VM
      srv.vm.network private_network, ip: servers[ip], :adapater=>2 # Add a second adapater with a specified IP
      srv.vm.network :forwarded_port, guest: 22, host: servers[port] # Add a port forwarding rule
      srv.vm.provision :shell, inline: "sed -i'' '/^127.0.0.1\t#{srv.vm.hostname}\t#{srv.vm.hostname}$/d' /etc/hosts" # Remove the extraneous first entry in /etc/hosts
      srv.vm.provider :virtualbox do |vb|
        vb.name = servers[name] # Name of the VM in VirtualBox
        vb.cpus = servers[cpus] # How many CPUs to allocate to the VM
        vb.memory = servers[ram] # How much memory to allocate to the VM
        vb.customize [modifyvm, :id, --cpuexecutioncap, 25]  # Limit to VM to 25% of available CPU
      end
    end
  end
end

Create a servers.yaml file

The servers.yaml file contains the configuration information for our VMs. You can copy/paste my servers.yaml below or use the version in the attachments area of this tutorial. Here is the content from my file:

---
- name: solr-dc01
  box: bento/centos-7.2
  cpus: 2
  ram: 2048
  ip: 192.168.56.101
  port: 10122
- name: solr-dc02
  box: bento/centos-7.2
  cpus: 2
  ram: 2048
  ip: 192.168.56.202
  port: 20222

Create commands.yaml file

The commands.yaml file contains the list of commands that should be run on each VM when they are first provisioned. This allows us to automate configuration tasks that would otherwise be tedious and/or repetitive. You can copy/paste my commands.yaml below or use the version in the attachments area of this tutorial. Here is the content from my file:

- sudo yum -y install net-tools ntp wget java-1.8.0-openjdk java-1.8.0-openjdk-devel lsof
- sudo systemctl enable ntpd && sudo systemctl start ntpd
- sudo systemctl disable firewalld && sudo systemctl stop firewalld
- sudo sed -i --follow-symlinks 's/^SELINUX=.*/SELINUX=disabled/g' /etc/sysconfig/selinux

Copy Solr release file to Vagrant our project directory

Our project directory is accessible to each of our Vagrant VMs via the /vagrant mount point. This allows us to easily access files and data located in our project directory. Instead of using scp to copy the Apache Solr release file to each of the VMs and creating duplicate files, we'll use a single copy located in our project directory.

cp ~/Downloads/solr-6.2.1.tgz .

NOTE: This assumes you are on a Mac and your downloads are in the ~/Downloads directory.

Start virtual machines

Now we are ready to start our 2 virtual machines for the first time. Creating the VMs for the first time and starting them every time after that uses the same command:

vagrant up

Once the process is complete you should have 2 servers running. You can verify by looking at VirtualBox. Notice I have 2 VMs running called solr-dc01 and solr-dc02:

Connect to each virtual machine

You are able to login to each of the VMs via ssh using the vagrant ssh command. You must specify the name of the VM you want to connect to.

vagrant ssh solr-dc01

Using another terminal window, repeat this process for solr-dc02.

Extract Solr install scripts

The Solr release archive file contains an installation script. This installation script will do the following by default:

NOTE: This assumes that you downloaded Solr 6.2.1

Install Solr under /opt/solr-6.2.1
Create a symbolic link between /opt/solr and /opt/solr-6.2.1
Create a solr user.
Live data such as indexes, logs, etc are stored in /var/solr.

On solr-dc01, run the following command:

tar xvfz /vagrant/solr-6.2.1.tgz  solr-6.2.1/bin/install_solr_service.sh --strip-components=2

Repeat this process for solr-dc02

This will create a file called install_solr_services.sh in your current directory, which should be the /home/vagrant.

Install Apache Solr

Now we can install Solr using the script defaults:

sudo bash ./install_solr_service.sh /vagrant/solr-6.2.1.tgz

The command above is the same as if you had specified the default settings:

sudo bash ./install_solr_service.sh /vagrant/solr-6.2.1.tgz -i /opt -d /var/solr -u solr -s solr -p 8983

After running the command, you should see something similar to this:

id: solr: no such user
Creating new user: solr
Extracting /vagrant/solr-6.2.1.tgz to /opt
Installing symlink /opt/solr -> /opt/solr-6.2.1 ...
Installing /etc/init.d/solr script ...
Installing /etc/default/solr.in.sh ...
Waiting up to 30 seconds to see Solr running on port 8983 [/]
Started Solr server on port 8983 (pid=29168). Happy searching!
Found 1 Solr nodes:
Solr process 29168 running on port 8983
{
  solr_home:/var/solr/data,
  version:6.2.1 43ab70147eb494324a1410f7a9f16a896a59bc6f - shalin - 2016-09-15 05:20:53,
  startTime:2016-10-31T19:46:27.997Z,
  uptime:0 days, 0 hours, 0 minutes, 12 seconds,
  memory:13.4 MB (%2.7) of 490.7 MB}
Service solr installed.

If you run the following command, you can see the Solr process is running:

ps -ef | grep solr
solr     28980     1  0 19:49 ?        00:00:11 java -server -Xms512m -Xmx512m -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/solr/logs/solr_gc.log -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/opt/solr/server -Dsolr.solr.home=/var/solr/data -Dsolr.install.dir=/opt/solr -Dlog4j.configuration=file:/var/solr/log4j.properties -Xss256k -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /var/solr/logs -jar start.jar --module=http

Repeat this process for solr-dc02

Modify Solr service

It's more convenient to use the OS services infrastructure to manage running Solr processes than manually using scripts. The installation process creates a service script that starts Solr in single instance mode. To take advantage of CDCR, you must use SolrCloud mode. We need to make some changes to the service script for this to work.

We'll be using the embedded Zookeeper instance for our tutorial. To do this, we need a zookeeper configuration file in our /var/solr/data directory. We'll copy the default configuration file from /opt/solr/server/solr/zoo.cfg.

sudo -u solr cp /opt/solr/server/solr/zoo.cfg /var/solr/data/zoo.cfg

Now we need the /etc/init.d/solr service script to run Solr in SolrCloud mode. This is done by adding the -c parameter to the start process. When no other parameters are specified, Solr will start an embedded Zookeeper instance on the Solr port + 1000. In our case, that should be 9983 because our default Solr port is 8983. Because this file is owned by root, we'll need to use sudo.

exit
sudo vi /etc/init.d/solr

Look near the end of the file for the line:

...
case $1 in
  start|stop|restart|status)
    SOLR_CMD=$1
...

This is the section that defines the Solr command. We want to change the SOLR_CMD=$1 line to look like this SOLR_CMD=$1 -c. This will tell Solr that it should start in cloud mode.

NOTE: In production, you would not use the embedded Zookeeper. You would update the /etc/defaults/solr.in.sh to set the ZK_HOST variable to the production Zookeeper instances. When this variable is set, Solr will not start the embedded Zookeeper.

So the section of your file should now look like this:

...
case $1 in
  start|stop|restart|status)
    SOLR_CMD=$1 -c
...

Now save the file:

Press the `esc` KEY
!wq

Let's stop Solr:

sudo service solr stop

Now we can start Solr using the new script:

sudo service solr start

Once the process is started, we can check the status:

sudo service solr status
Found 1 Solr nodes:
Solr process 29426 running on port 8983
{
  solr_home:/var/solr/data,
  version:6.2.1 43ab70147eb494324a1410f7a9f16a896a59bc6f - shalin - 2016-09-15 05:20:53,
  startTime:2016-10-31T22:16:22.116Z,
  uptime:0 days, 0 hours, 0 minutes, 14 seconds,
  memory:30.2 MB (%6.1) of 490.7 MB,
  cloud:{
    ZooKeeper:localhost:9983,
    liveNodes:1,
    collections:0}}

As you can see, the process started successfully and there is a single cloud node running using Zookeeper on port 9983.

Repeat this process for solr-dc02.

Create Solr dc01 configuration

The solr-dc01 Solr instance will be our source collection for replication. To enable CDCR we need to make a few changes to the solrconfig.xml configuration file. We'll use the data_driven_schema_configs as a base for our configuration. We need to create two different configurations because the source collection has a slightly different configuration than the target collection.

On the solr-dc01 VM, copy the data_driven_schema_configs directory to the vagrant home directory. If you are following along, you should still be the vagrant user.

cd /home/vagrant
cp -r /opt/solr/server/solr/configsets/data_driven_schema_configs .

Edit the solrconfig.xml file:

vi data_driven_schema_configs/conf/solrconifg.xml

The first thing we are going to do is update the updateHandler definition; there is only one in the file. Find the section in the configuration file that looks like this:

<updateHandler class=solr.DirectUpdateHandler2>

We are going to change the updateLog portion of the configuration. Remember that we are using vi as the text editor, so edit using the appropriate vi commands. Change this:

    <updateLog>
      <str name=dir>${solr.ulog.dir:}</str>
      <int name=numVersionBuckets>${solr.ulog.numVersionBuckets:65536}</int>
    </updateLog>

to this:

   <updateLog class=solr.CdcrUpdateLog>
      <str name=dir>${solr.ulog.dir:}</str>
      <int name=numVersionBuckets>${solr.ulog.numVersionBuckets:65536}</int>
    </updateLog>

Now we need to create a new requestHandler definition. Find the section in the configuration file that looks like this:

  <!-- A request handler that returns indented JSON by default -->
  <requestHandler name=/query class=solr.SearchHandler>
    <lst name=defaults>
      <str name=echoParams>explicit</str>
      <str name=wt>json</str>
      <str name=indent>true</str>
    </lst>
  </requestHandler>

We are going to add our new definition just after the closing requestHandler. Add the following new definition:

  <!-- A request handler for cross data center replication -->
  <requestHandler name=/cdcr class=solr.CdcrRequestHandler>
    <lst name=replica>
      <str name=zkHost>192.168.56.202:9983</str>
      <str name=source>collection1</str>
      <str name=target>collection1</str>
    </lst>
    <lst name=replicator>
      <str name=threadPoolSize>8</str>
      <str name=schedule>1000</str>
      <str name=batchSize>128</str>
    </lst>
    <lst name=updateLogSynchronizer>
      <str name=schedule>1000</str>
    </lst>
  </requestHandler>

Your updated file should now look like this:

...
  <!-- A request handler that returns indented JSON by default -->
  <requestHandler name=/query class=solr.SearchHandler>
    <lst name=defaults>
      <str name=echoParams>explicit</str>
      <str name=wt>json</str>
      <str name=indent>true</str>
    </lst>
  </requestHandler>
  <!-- A request handler for cross data center replication -->
  <requestHandler name=/cdcr class=solr.CdcrRequestHandler>
    <lst name=replica>
      <str name=zkHost>192.168.56.202:9983</str>
      <str name=source>collection1</str>
      <str name=target>collection1</str>
    </lst>
    <lst name=replicator>
      <str name=threadPoolSize>8</str>
      <str name=schedule>1000</str>
      <str name=batchSize>128</str>
    </lst>
    <lst name=updateLogSynchronizer>
      <str name=schedule>1000</str>
    </lst>
  </requestHandler>
...

NOTE: The zkHost line should have the ip address and port of the Zookeeper instance of the target collection. Our target collection is on solr-dc02, so this ip and port are pointing to solr-dc02. When we create our collections in Solr, we'll use the name collection1.

Now save the file:

Press the `esc` KEY
!wq

Create Solr dc02 configuration

The solr-dc02 Solr instance will be our target collection for replication. To enable CDCR we need to make a few changes to the solrconfig.xml configuration file. As above, we'll use the data_driven_schema_configs as a base for our configuration.

On solr-dc02, copy the data_driven_schema_configs directory to the vagrant home directory. If you are following along, you should still be the vagrant user.

cd /home/vagrant
cp -r /opt/solr/server/solr/configsets/data_driven_schema_configs .

Edit the solrconfig.xml file:

vi data_driven_schema_configs/conf/solrconifg.xml

The first thing we are going to do is update the updateHandler definition; there is only one in the file. Find the section in the configuration file that looks like this:

  <updateHandler class=solr.DirectUpdateHandler2>

We are going to change the updateLog portion of the configuration. Remember that we are using vi as the text editor. Change this:

    <updateLog>
      <str name=dir>${solr.ulog.dir:}</str>
      <int name=numVersionBuckets>${solr.ulog.numVersionBuckets:65536}</int>
    </updateLog>

to this:

   <updateLog class=solr.CdcrUpdateLog>
      <str name=dir>${solr.ulog.dir:}</str>
      <int name=numVersionBuckets>${solr.ulog.numVersionBuckets:65536}</int>
    </updateLog>

Now we need to create a new requestHandler definition. Find the section in the configuration file that looks like this:

  <!-- A request handler that returns indented JSON by default -->
  <requestHandler name=/query class=solr.SearchHandler>
    <lst name=defaults>
      <str name=echoParams>explicit</str>
      <str name=wt>json</str>
      <str name=indent>true</str>
    </lst>
  </requestHandler>

We are going to add our new definition just after the closing requestHandler. Add the following new definition:

  <!-- A request handler for cross data center replication -->
  <requestHandler name=/cdcr class=solr.CdcrRequestHandler>
    <lst name=buffer>
      <str name=defaultState>disabled</str>
    </lst>
  </requestHandler>
  <!-- A request handler for cross data center replication -->
  <requestHandler name=/update class=solr.UpdateRequestHandler>
    <lst name=defaults>
      <str name=update.chain>cdcr-processor-chain</str>
    </lst>
  </requestHandler>
  <updateRequestProcessorChain name=cdcr-processor-chain>
    <processor class=solr.CdcrUpdateProcessorFactory/>
    <processor class=solr.RunUpdateProcessorFactory/>
  </updateRequestProcessorChain>

Your updated file should now look like this:

...
  <!-- A request handler that returns indented JSON by default -->
  <requestHandler name=/query class=solr.SearchHandler>
    <lst name=defaults>
      <str name=echoParams>explicit</str>
      <str name=wt>json</str>
      <str name=indent>true</str>
    </lst>
  </requestHandler>
  <!-- A request handler for cross data center replication -->
  <requestHandler name=/cdcr class=solr.CdcrRequestHandler>
    <lst name=buffer>
      <str name=defaultState>disabled</str>
    </lst>
  </requestHandler>
  <!-- A request handler for cross data center replication -->
  <requestHandler name=/update class=solr.UpdateRequestHandler>
    <lst name=defaults>
      <str name=update.chain>cdcr-processor-chain</str>
    </lst>
  </requestHandler>
  <updateRequestProcessorChain name=cdcr-processor-chain>
    <processor class=solr.CdcrUpdateProcessorFactory/>
    <processor class=solr.RunUpdateProcessorFactory/>
  </updateRequestProcessorChain>
...

Now save the file:

Press the `esc` KEY
!wq

You should see how the two configurations are different between the source and target collections.

Create Solr collection on solr-dc01 and solr-dc02

Now we should be able to create a collection using our update configuration. Because the two configurations are different, make sure you run this command on both the solr-dc01 and solr-dc02 VMs. This is creating the collections in our respective data centers.

/opt/solr/bin/solr create -c collection1 -d ./data_driven_schema_configs

NOTE: We are using the same collection name that has CDCR enabled in the configuration.

You should see something similar to this:

/opt/solr/bin/solr create -c collection1 -d ./data_driven_schema_configs
Connecting to ZooKeeper at localhost:9983 ...
Uploading /home/vagrant/data_driven_schema_configs/conf for config collection1 to ZooKeeper at localhost:9983
Creating new collection 'collection1' using command:
http://localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=1&replicationF...
{
  responseHeader:{
    status:0,
    QTime:3684},
  success:{192.168.56.101:8983_solr:{
      responseHeader:{
        status:0,
        QTime:2546},
      core:collection1_shard1_replica1}}}

Now we can verify the collection exists in the Solr admin ui via: http://192.168.56.101:8983/solr/#/~cloud

You should see something similar to this:

As you can see, there is a single collection named collection1 which has 1 shard. You can repeat this process on solr-dc02 and see something similar.

NOTE: Remember that solr-dc01 is 192.168.56.101 and solr-dc02 is 192.168.56.202.

Turn on replication

Let's first check the status of replication. Each of these curl commands is interacting with the collection api. You can check the status of replication using the following command:

curl -XPOST 'http://192.168.56.101:8983/solr/collection1/cdcr?action=STATUS'

You should see something similar to this:

curl -XPOST 'http://192.168.56.101:8983/solr/collection1/cdcr?action=STATUS'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader><int name=status>0</int><int name=QTime>5</int></lst><lst name=status><str name=process>stopped</str><str name=buffer>enabled</str></lst>
</response>

You should notice the process is displayed as stopped. We want to start the replication process.

curl -XPOST 'http://192.168.56.101:8983/solr/collection1/cdcr?action=START'

You should see something similar to this:

curl -XPOST 'http://192.168.56.101:8983/solr/collection1/cdcr?action=START'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader><int name=status>0</int><int name=QTime>41</int></lst><lst name=status><str name=process>started</str><str name=buffer>enabled</str></lst>
</response>

You should notice the process is now started. Now we need to disable the buffer on the target colleciton which will buffer the updates by default.

curl -XPOST 'http://192.168.56.202:8983/solr/collection1/cdcr?action=DISABLEBUFFER'

You should see something similar to this:

curl -XPOST 'http://192.168.56.202:8983/solr/collection1/cdcr?action=DISABLEBUFFER'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader><int name=status>0</int><int name=QTime>7</int></lst><lst name=status><str name=process>started</str><str name=buffer>disabled</str></lst>
</response>

You should notice the buffer is now disabled.

Add documents to source Solr collection in solr-dc01

Now we will add a couple of sample documents to collection1 in solr-dc01. Run the following command to add 2 sample documents:

curl -XPOST -H 'Content-Type: application/json' 'http://192.168.56.101:8983/solr/collection1/update' --data-binary '{
 add : {
  doc : {
   id : 1,
   text_ws : This is document number one.
  }
 },
 add : {
  doc : {
   id : 2,
   text_ws : This is document number two.
  }
 },
 commit : {}
}'

You should notice the commit command in the JSON above. That is because the default solrconfig.xml does not have automatic commits enabled. You should get a response back similar to this:

{responseHeader:{status:0,QTime:362}}

Query solr-dc01 collection

Let's query collection1 on solr-dc01 to ensure the documents are present. Run the following command:

curl -XGET 'http://192.168.56.101:8983/solr/collection1/select?q=*:*&indent=true'

You should see something similar to this:

curl -XGET 'http://192.168.56.101:8983/solr/collection1/select?q=*:*&indent=true'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader>
  <bool name=zkConnected>true</bool>
  <int name=status>0</int>
  <int name=QTime>17</int>
  <lst name=params>
    <str name=q>*:*</str>
    <str name=indent>true</str>
  </lst>
</lst>
<result name=response numFound=2 start=0>
  <doc>
    <str name=id>1</str>
    <str name=text_ws>This is document number one.</str>
    <long name=_version_>1549823582071160832</long></doc>
  <doc>
    <str name=id>2</str>
    <str name=text_ws>This is document number two.</str>
    <long name=_version_>1549823582135123968</long></doc>
</result>
</response>

Query solr-dc02 collection

Before executing the query on solr-dc02, we need to commit the changes. As mentioned above, automatic commits are not enabled in the default solrconfig.xml. Run the following command;

curl -XPOST -H 'Content-Type: application/json' 'http://192.168.56.202:8983/solr/collection1/update' --data-binary '{
 commit : {}
}'

You should see a response similar to this:

{responseHeader:{status:0,QTime:5}}

Now we can run our query:

curl -XGET 'http://192.168.56.202:8983/solr/collection1/select?q=*:*&indent=true'

You should see something similar to this:

curl -XGET 'http://192.168.56.202:8983/solr/collection1/select?q=*:*&indent=true'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader>
  <bool name=zkConnected>true</bool>
  <int name=status>0</int>
  <int name=QTime>17</int>
  <lst name=params>
    <str name=q>*:*</str>
    <str name=indent>true</str>
  </lst>
</lst>
<result name=response numFound=2 start=0>
  <doc>
    <str name=id>1</str>
    <str name=text_ws>This is document number one.</str>
    <long name=_version_>1549823582071160832</long></doc>
  <doc>
    <str name=id>2</str>
    <str name=text_ws>This is document number two.</str>
    <long name=_version_>1549823582135123968</long></doc>
</result>
</response>

You should notice that you have 2 documents, which have the same id and text_ws content as you pushed to solr-dc01.

Review

If you followed along with this tutorial, you have successfully set up cross data center replication between two SolrCloud configurations.

Some important points to keep in mind:

Because this is an active-passive approach, there is only a single source system. If the source system goes down, your ingest will stop as the other data center is read-only and should not have updates pushed outside of the replication process. Work is being done to make Solr CDCR active-active.
Cross data center communications can be a potential bottleneck. If the cross data center connection can not sustain sufficient throughput, the target data center(s) can fall behind in replication.
CDCR is not intended nor optimized for bulk inserts. If you have a need to do bulk inserts, first synchronize the indexes between the data centers outside of the replication process. Then enable replication for incremental updates.

For more information, read about Cross Data Center Replication https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62687462

Cloudera Community

Community Articles

How to setup cross data center replication in SolrCloud 6.

Apache Solr

Objective

Prerequisites

Scope

Steps

Create Vagrant project directory

Create Vagrantfile

Create a servers.yaml file

Create commands.yaml file

Copy Solr release file to Vagrant our project directory

Start virtual machines

Connect to each virtual machine

Extract Solr install scripts

Install Apache Solr

Modify Solr service

Create Solr dc01 configuration

Create Solr dc02 configuration

Create Solr collection on solr-dc01 and solr-dc02

Turn on replication

Add documents to source Solr collection in solr-dc01

Query solr-dc01 collection

Query solr-dc02 collection

Review