Member since
02-09-2016
559
Posts
422
Kudos Received
98
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2808 | 03-02-2018 01:19 AM | |
4475 | 03-02-2018 01:04 AM | |
3026 | 08-02-2017 05:40 PM | |
2815 | 07-17-2017 05:35 PM | |
2072 | 07-10-2017 02:49 PM |
01-28-2017
06:20 AM
For the backtick approach, you might want to try `header`.`timestamp`
... View more
08-20-2018
09:05 PM
Is this workflow similar for HDF 3.1 Sandbox? If not, kindly let us know the changes.
... View more
11-10-2016
02:35 AM
2 Kudos
Objective If you are using Docker on CentOS 7, you may have run into issues importing the Docker version of the Hortonworks Sandbox for HDP 2.5. The default configuration of Docker on CentOS 7 limits your Docker virtual machine image to 10GB in size. The Docker HDP sandbox is 13GB in size. This will cause the loading process of the sandbox to fail. This tutorial will guide you through the process of installing Docker on CentOS 7 using Vagrant and modifying the configuration of Docker to move the location of and increase the size of Docker virtual machine image. While we are using Vagrant + Virtualbox, this process should work for any install of CentOS (Amazon, etc) with small changes because VirtualBox and related plugins are not needed. Prerequisites You should already have downloaded the Docker HDP 2.5 Sandbox. Read more here: Docker HDP Sandbox You should already have installed VirtualBox 5.1.x. Read more here: VirtualBox You should already have installed Vagrant 1.8.6. Read more here: Vagrant You should already have installed the vagrant-vbguest plugin. This plugin will keep the VirtualBox Guest Additions software current as you upgrade your kernel and/or VirtualBox versions. Read more here: vagrant-vbguest You should already have installed the vagrant-hostmanager plugin. This plugin will automatically manage the /etc/hosts file on your local mac and in your virtual machines. Read more here: vagrant-hostmanager Scope Mac OS X 10.11.6 (El Capitan) VirtualBox 5.1.6 Vagrant 1.8.6 vagrant-vbguest plugin 0.13.0 vagrant-hostnamanger plugin 1.8.5 Steps Create Vagrant project directory Before we get started, determine where you want to keep your Vagrant project files. Each Vagrant project should have its own directory. I keep my Vagrant projects in my ~/Development/Vagrant directory. You should also use a helpful name for each Vagrant project directory you create.
cd ~/Development/Vagrant
mkdir centos7-docker
cd centos7-docker We will be using a CentOS 7.2 Vagrant box, so I include centos7 in the Vagrant project name to differentiate it from a Centos 6 project. The project is for Docker, so I include that in the name. Thus we have a project directory name of centos7-docker. Create Vagrant project files Create Vagrantfile The Vagrantfile tells Vagrant how to configure your virtual machines. Here is my Vagrantfile:
# -*- mode: ruby -*-
# vi: set ft=ruby :
# Using yaml to load external configuration files
require 'yaml'
Vagrant.configure("2") do |config|
# Using the hostmanager vagrant plugin to update the host files
config.hostmanager.enabled = true
config.hostmanager.manage_host = true
config.hostmanager.manage_guest = true
config.hostmanager.ignore_private_ip = false
# Loading in the list of commands that should be run when the VM is provisioned.
commands = YAML.load_file('commands.yaml')
commands.each do |command|
config.vm.provision :shell, inline: command
end
# Loading in the VM configuration information
servers = YAML.load_file('servers.yaml')
servers.each do |servers|
config.vm.define servers["name"] do |srv|
srv.vm.box = servers["box"] # Speciy the name of the Vagrant box file to use
srv.vm.hostname = servers["name"] # Set the hostname of the VM
srv.vm.network "private_network", ip: servers["ip"], :adapater=>2 # Add a second adapater with a specified IP
srv.vm.network :forwarded_port, guest: 22, host: servers["port"] # Add a port forwarding rule
srv.vm.provision :shell, inline: "sed -i'' '/^127.0.0.1\\t#{srv.vm.hostname}\\t#{srv.vm.hostname}$/d' /etc/hosts"
srv.vm.provider :virtualbox do |vb|
vb.name = servers["name"] # Name of the VM in VirtualBox
vb.cpus = servers["cpus"] # How many CPUs to allocate to the VM
vb.memory = servers["ram"] # How much memory to allocate to the VM
vb.customize ["modifyvm", :id, "--cpuexecutioncap", "75"] # Limit to VM to 75% of available CPU
end
end
end
end Create a servers.yaml file The servers.yaml file contains the configuration information for our VMs. Here is the content from my file:
---
- name: docker
box: bento/centos-7.2
cpus: 6
ram: 12288
ip: 192.168.56.100
port: 10022 NOTE: My configuration uses 6 CPUs and 12GB of memory for VirtualBox. Make sure you have enough resources to use this configuration. For the purposes of the this demo, you can lower this to a 2CPU and 4GB of memory. Create commands.yaml file The commands.yaml file contains the list of commands that should be run on each VM when they are first provisioned. This allows us to automate configuration tasks that would other wise be tedious and/or repetitive. Here is the content from my file:
- "sudo yum -y install net-tools ntp wget"
- "sudo systemctl enable ntpd && sudo systemctl start ntpd"
- "sudo systemctl disable firewalld && sudo systemctl stop firewalld"
- "sudo sed -i --follow-symlinks 's/^SELINUX=.*/SELINUX=disabled/g' /etc/sysconfig/selinux" Start Virtual Machine Once you have created the 3 files in your Vagrant project directory, you are ready to start your virtual machine. Creating the vm for the first time and starting it every time after that uses the same command:
vagrant up NOTE: During the startup process, the vagrant-hostmanager will prompt you for a password. This is the sudo password for your local machine. It needs to use sudo to update the /etc/hosts file. Once the process is complete you should have 1 server running. You can verify by looking at the VirtualBox user interface; you should have a virtual machine called docker running. Connect to virtual machine You are able to login to the virtual machine via ssh using the vagrant ssh command.
vagrant ssh Install Docker in Vagrant virtual machine Now that you are connected to the virtual machine, it's time to install Docker. First we'll create the docker.repo file:
sudo tee /etc/yum.repos.d/docker.repo <<-'EOF'
[dockerrepo]
name=Docker Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/7/
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg
EOF Now we can install Docker:
sudo yum install docker-engine Now we need to enable the Docker service:
sudo systemctl enable docker.service Now we can start the Docker service:
sudo systemctl start docker Load Docker HDP sandbox We will attempt to load the HDP sandbox into Docker using the default Docker settings. This process will fail after the image has loaded approximately 10GB of data. First, make sure a copy of the Docker HDP sandbox file HDP_2.5_docker.tar.gz is located in your project directory. This will allow you to access the file via the /vagrant/ directory within your virtual machine. I'm doing this on a Mac and my file was downloaded in ~/Downloads . This command is run on your local computer:
cp ~/Downloads/HDP_2.5_docker.tar.gz ~/Development/Vagrant/centos7-docker/ Now we can try loading the sandbox image on the VM. You need to use sudo to interact with Docker. You should see something similar to the following:
sudo docker load < /vagrant/HDP_2.5_docker.tar.gz
b1b065555b8a: Loading layer [==================================================>] 202.2 MB/202.2 MB
0b547722f59f: Loading layer [=====================================> ] 10.36 GB/13.84 GB
ApplyLayer exit status 1 stdout: stderr: write /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64/jre/lib/rt.jar: no space left on device As you can see above, the load command failed and it did so a little more than 10GB into the process. That is because the Docker virtual machine image defaults to 10GB in size. We can check to see where our Docker Root is located:
sudo docker info | grep "Docker Root"
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Docker Root Dir: /var/lib/docker We can see from above the default is /var/lib/docker . Change Docker configuration CentOS 7 uses systemd to manage services. You can read more about it here systemd. The older methods of modifying Docker configuration using /etc/sysconfig based configuration files should be avoided. While you can get everything working using this approach, it's better to use the appropriate systemd methods. Some people have modified the /etc/systemd/system/multi-user.target.wants/docker.service file to make changes to the Docker configuration. This file can be overwritten during software updates, so it is best to not modify this file. Instead you should use /etc/systemd/system/docker.service.d/docker.conf . We need to create an /etc/systemd/system/docker.service.d directory. This is where our docker.conf configuration file will be located.
sudo mkdir -p /etc/systemd/system/docker.service.d Now we can edit our docker.conf file:
sudo vi /etc/systemd/system/docker.service.d/docker.conf You should add the following configuration to the file:
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --graph="/mnt/docker-data" --storage-driver=overlay --storage-opt=dm.basesize=30G The blank ExecStart= entry is not a typo. It is required to wipe the default configuration from memory. The --graph parameter specifies the location where the Docker image file should be located. Make sure this location has sufficient room for your expected data size. The --storage-driver parameter specifies the storage driver for Docker. The default is devicemapper which is limited to 10GB. Set it to overlay to allow for larger file sizes. You can read more about the storage drivers here docker storage drivers. The --storage-opt parameter allows us to change the base image size of the virtual machine. In this example I've set my base image size to 30GB. You may want to use a larger size. Restart Docker Now that we've updated our configuration, we need to restart the Docker daemon.
sudo systemctl daemon-reload
sudo systemctl restart docker We can now check to see where our Docker Root is. You should see something similar to this:
sudo docker info | grep "Docker Root"
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Docker Root Dir: /mnt/docker-data As you can see, our Docker Root has moved to /mnt/docker-data which is the location we specified with the --graph parameter in our docker.conf file. Load Docker HDP sandbox Now that we have updated our Docker configuration and restarted the daemon, we should be able to load our HDP sandbox again. Let's run the load command to see if it completed. You should see something similar to this:
sudo docker load < /vagrant/HDP_2.5_docker.tar.gz
b1b065555b8a: Loading layer [==================================================>] 202.2 MB/202.2 MB
0b547722f59f: Loading layer [==================================================>] 13.84 GB/13.84 GB
99d7327952e0: Loading layer [==================================================>] 234.8 MB/234.8 MB
294b1c0e07bd: Loading layer [==================================================>] 207.5 MB/207.5 MB
fd5c10f2f1a1: Loading layer [==================================================>] 387.6 kB/387.6 kB
6852ef70321d: Loading layer [==================================================>] 163 MB/163 MB
517f170bbf7f: Loading layer [==================================================>] 20.98 MB/20.98 MB
665edb80fc91: Loading layer [==================================================>] 337.4 kB/337.4 kB
Loaded image: sandbox:latest As you can see, the load command was successful. Review If you successfully followed along with this tutorial, we were able setup a CentOS 7 virtual machine using Vagrant. We updated the Docker configuration to use a different location for the virtual machine image and changed the base size of that image. Once these changes were complete, we were able to successfully load the HDP sandbox image. You can read more about how to install Docker on CentOS 7 here https://docs.docker.com/engine/installation/linux/centos/ and configure it here https://docs.docker.com/engine/admin/
... View more
Labels:
11-08-2016
10:48 AM
1 Kudo
Objective
Many people work exclusively from a laptop where storage space is typically limited to 500GB of space or less. Over time, you may find your available storage space has become a regular concern. It's not uncommon to use an external hard drive to augment available storage space.
The current version of Docker for Mac (1.12.x) does not provide a configuration setting which allows users to change the location where the Docker virtual machine image is located. This means the image, which can grow up to 64GB in size by default, is located on your laptop's primary hard drive.
With the HDP 2.5 version of the Hortonworks sandbox available as a native Docker image, you may find a desire to have more room available to Docker. This tutorial will guide you through the process of moving your Docker virtual machine image to a different location, an external drive in this case. This will free up to 64GB of space on your primary laptop hard drive and let you expand the size of the Docker image file later. This tutorial is the first in a two part series.
Prerequisites
You should have already completed the following tutorial Installing Docker Version of Sandbox on Mac
You should have an external or secondary hard drive available.
Scope
Mac OS X 10.11.6 (El Capitan)
Docker for Mac 1.12.1
HDP 2.5 Docker Sandbox
Steps
Stop Docker for Mac
Before we can make any changes to the Docker virtual machine image, we need to stop Docker for Mac. There should be a Docker for Mac icon in the menu bar. You should see something similar to this:
You can also check via the command line via the ps -ef | grep -i com.docker . You should see something similar to this:
ps -ef | grep -i com.docker
0 123 1 0 8:45AM ?? 0:00.01 /Library/PrivilegedHelperTools/com.docker.vmnetd
502 967 876 0 8:46AM ?? 0:00.08 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 969 967 0 8:46AM ?? 0:00.04 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 971 967 0 8:46AM ?? 0:07.96 com.docker.db --url fd:3 --git /Users/myoung/Library/Containers/com.docker.docker/Data/database
502 975 967 0 8:46AM ?? 0:03.40 com.docker.osx.hyperkit.linux
502 977 975 0 8:46AM ?? 0:00.03 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux
502 12807 967 0 9:17PM ?? 0:00.08 com.docker.osxfs --address fd:3 --connect /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --control fd:4 --volume-control fd:5 --database /Users/myoung/Library/Containers/com.docker.docker/Data/s40
502 12810 967 0 9:17PM ?? 0:00.12 com.docker.slirp --db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 --ethernet fd:3 --port fd:4 --vsock-path /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --max-connections 900
502 12811 967 0 9:17PM ?? 0:00.19 com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 12812 12811 0 9:17PM ?? 0:00.02 /Applications/Docker.app/Contents/MacOS/com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 12814 12811 0 9:17PM ?? 0:16.48 /Applications/Docker.app/Contents/MacOS/com.docker.hyperkit -A -m 12G -c 6 -u -s 0:0,hostbridge -s 31,lpc -s 2:0,virtio-vpnkit,uuid=1f629fed-1ef6-4f34-8fce-753347e3b941,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s50,macfile=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/mac.0 -s 3,virtio-blk,file:///Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2,format=qcow -s 4,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s40,tag=db -s 5,virtio-rnd -s 6,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s51,tag=port -s 7,virtio-sock,guest_cid=3,path=/Users/myoung/Library/Containers/com.docker.docker/Data,guest_forwards=2376;1525 -l com1,autopty=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty,log=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/console-ring -f kexec,/Applications/Docker.app/Contents/Resources/moby/vmlinuz64,/Applications/Docker.app/Contents/Resources/moby/initrd.img,earlyprintk=serial console=ttyS0 com.docker.driver="com.docker.driver.amd64-linux", com.docker.database="com.docker.driver.amd64-linux" ntp=gateway mobyplatform=mac -F /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/hypervisor.pid
502 13790 876 0 9:52PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 13791 13790 0 9:52PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 13793 13146 0 9:52PM ttys000 0:00.00 grep -i com.docker
Now we are going to stop Docker for Mac. Before shutting down Docker, make sure all of your containers have been stopped. Using the menu shown above, click on the Quit Docker menu option. This will stop Docker for Mac. You should notice the Docker for Mac icon is no longer visible.
Now let's confirm the Docker processes we saw before are no longer running:
ps -ef | grep -i com.docker
0 123 1 0 8:45AM ?? 0:00.01 /Library/PrivilegedHelperTools/com.docker.vmnetd
502 13815 13146 0 9:54PM ttys000 0:00.00 grep -i com.docker
NOTE: It may take a minute or two before Docker completely shuts down.
Backup Docker virtual machine image
Before we make any changes to the Docker virtual machine image, we should back it up. This will temporarily use more space on your laptop hard drive. Make sure you have enough room to hold two copies of the data. As mentioned before, the Docker image can be up to 64GB by default. Let's check the current size of our image using du -sh . The Docker image file is located at ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/ by default.
du -sh ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/
64G /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/
In my case, my image size is 64GB. You need to be sure you have room for 2 copies of the com.docker.driver.amd64-linux directory. Now we'll make a copy of our image:
cd ~/Library/Containers/com.docker.docker/Data/
cp -r com.docker.driver.amd64-linux com.docker.driver.amd64-linux.backup
This copy serves as our backup of the image.
Copy Docker virtual machine image to external drive
Now we can make a copy of our image on our external hard drive. I have a 1TB SSD mounted at /Volumes/Samsung . I am going to store my Docker virtual machine image in /Volumes/Samsung/Docker/image . You should store the image in a location that makes sense for you.
cp -r com.docker.driver.amd64-linux /Volumes/Samsung/Docker/image/
This process will take a few minutes. It will take longer if you are not using an SSD. Let's confirm the directory now exists on the external hard drive.
ls -la /Volumes/Samsung/Docker/image/
total 0
drwxr-xr-x 3 myoung staff 102 Nov 3 17:08 .
drwxr-xr-x 11 myoung staff 374 Nov 3 17:03 ..
drwxr-xr-x@ 11 myoung staff 374 Nov 7 21:53 com.docker.driver.amd64-linux
You can also check the size:
du -sh /Volumes/Samsung/Docker/image/
64G /Volumes/Samsung/Docker/image/
Create symbolic link for Docker virtual machine image
Now that we have a copy of the Docker image on the external hard drive, we will use a symbolic link from the image directory on the laptop hard drive to image directory on the external hard drive. Before creating the link, we need to remove the current image directory on our laptop hard drive
rm -rf ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux
Now let's create the symbolic link. We will use the ln -s command. The syntax for ln is ln -s <target> <source> . In this case, target is the location on the external drive and source is the location on the internal drive.
ln -s /Volumes/Samsung/Docker/image/com.docker.driver.amd64-linux ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux
We can confirm the link was created:
ls -la ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux
lrwxr-xr-x 1 myoung staff 59 Nov 3 17:05 /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux -> /Volumes/Samsung/Docker/image/com.docker.driver.amd64-linux
Restart Docker for Mac
Now we can restart Docker for Mac. This is done by running the application from the Applications folder in the Finder. You should see something similar to this:
Double-click on the Docker application to start it. You should notice the Docker for Mac icon is now back in the main menu bar. You can also check via ps -ef | grep -i com.docker . You should see something similar to this:
ps -ef | grep -i com.docker
0 123 1 0 8:45AM ?? 0:00.01 /Library/PrivilegedHelperTools/com.docker.vmnetd
502 14476 14465 0 10:42PM ?? 0:00.03 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 14479 14476 0 10:42PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 14480 14476 0 10:42PM ?? 0:00.29 com.docker.db --url fd:3 --git /Users/myoung/Library/Containers/com.docker.docker/Data/database
502 14481 14476 0 10:42PM ?? 0:00.08 com.docker.osxfs --address fd:3 --connect /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --control fd:4 --volume-control fd:5 --database /Users/myoung/Library/Containers/com.docker.docker/Data/s40
502 14482 14476 0 10:42PM ?? 0:00.04 com.docker.slirp --db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 --ethernet fd:3 --port fd:4 --vsock-path /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --max-connections 900
502 14483 14476 0 10:42PM ?? 0:00.05 com.docker.osx.hyperkit.linux
502 14484 14476 0 10:42PM ?? 0:00.08 com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 14485 14483 0 10:42PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux
502 14486 14484 0 10:42PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 14488 14484 0 10:42PM ?? 0:07.90 /Applications/Docker.app/Contents/MacOS/com.docker.hyperkit -A -m 12G -c 6 -u -s 0:0,hostbridge -s 31,lpc -s 2:0,virtio-vpnkit,uuid=1f629fed-1ef6-4f34-8fce-753347e3b941,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s50,macfile=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/mac.0 -s 3,virtio-blk,file:///Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2,format=qcow -s 4,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s40,tag=db -s 5,virtio-rnd -s 6,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s51,tag=port -s 7,virtio-sock,guest_cid=3,path=/Users/myoung/Library/Containers/com.docker.docker/Data,guest_forwards=2376;1525 -l com1,autopty=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty,log=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/console-ring -f kexec,/Applications/Docker.app/Contents/Resources/moby/vmlinuz64,/Applications/Docker.app/Contents/Resources/moby/initrd.img,earlyprintk=serial console=ttyS0 com.docker.driver="com.docker.driver.amd64-linux", com.docker.database="com.docker.driver.amd64-linux" ntp=gateway mobyplatform=mac -F /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/hypervisor.pid
502 14559 14465 0 10:46PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 14560 14559 0 10:46PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 14562 13146 0 10:46PM ttys000 0:00.00 grep -i com.docker
You should notice the Docker processes are running again. You can also check the timestamp of files in the Docker image directory on the external hard drive:
ls -la /Volumes/Samsung/Docker/image/com.docker.driver.amd64-linux
total 134133536
drwxr-xr-x@ 12 myoung staff 408 Nov 7 22:42 .
drwxr-xr-x 3 myoung staff 102 Nov 3 17:08 ..
-rw-r--r-- 1 myoung staff 68676222976 Nov 7 22:45 Docker.qcow2
-rw-r--r-- 1 myoung staff 65536 Nov 7 22:42 console-ring
-rw-r--r-- 1 myoung staff 5 Nov 7 22:42 hypervisor.pid
-rw-r--r-- 1 myoung staff 0 Aug 24 16:06 lock
drwxr-xr-x 67 myoung staff 2278 Nov 5 22:00 log
-rw-r--r-- 1 myoung staff 17 Nov 7 22:42 mac.0
-rw-r--r-- 1 myoung staff 36 Aug 24 16:06 nic1.uuid
-rw-r--r-- 1 myoung staff 5 Nov 7 22:42 pid
-rw-r--r-- 1 myoung staff 59619 Nov 7 22:42 syslog
lrwxr-xr-x 1 myoung staff 12 Nov 7 22:42 tty -> /dev/ttys001
You should notice the timestamp of the Docker.qcow2 file has been updated which means Docker is now using this location for its image file.
Start a Docker container
You should attempt to start a Docker container to make sure everything is working fine. You can start the HDP sandbox via docker start sandbox if you've already installed it as listed in the prerequisites. If everything is working fine, you can delete the backup.
Delete Docker backup image
Now that everything is working using the new location, we can remove our backup.
rm -rf ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux.backup
Review
If you successfully followed along with this tutorial, we were able to move our Docker for Mac virtual machine image to an external hard drive. This should free up to 64GB of space on your laptop hard drive. Look for part two in the series to learn how to increase the size of your Docker image.
... View more
Labels:
11-10-2016
01:53 PM
@Marcia Hon I wrote an HCC article that walks you through the process of increasing the base size of the CentOS 7 docker vm image. This is the preferred method for making these changes. https://community.hortonworks.com/content/kbentry/65714/how-to-modify-the-default-docker-configuration-on.html
... View more
11-04-2016
07:24 PM
Good catch. Tutorial has been updated to provide more links.
... View more
11-01-2016
08:55 PM
6 Kudos
Objective Cross Data Center Replication, commonly abbreviated as CDCR, is a new feature found in SolrCloud 6.x. This feature enables Solr to replicate data from one source collection to one or more target collections distributed between data centers. The current version provides an active-passive disaster recovery solution for Solr. Data updates, which include adds, updates, and deletes, are copied from the source collection to the target collection. This means the target collection should not be sent data updates outside of the CDRC functionality. Prior to SolrCloud 6.x you had to manually design a strategy for replication across data centers. This tutorial will guide you through the process of enabling CDCR between two SolrCloud clusters, each with 1 server, in a Vagrant + VirtualBox environment. NOTE: Solr 6 is being deployed as a standalone application. HDP 2.5 provides support for Solr 5.5.2 via HDPSearch which does not include CDCR functionality. Prerequisites You should have already installed the following: VirtualBox 5.1.6 (VirtualBox) Vagrant 1.8.6 (Vagrant) Vagrant plugin vagrant-vbguest 0.13.x (vagrant-vbguest) Vagrant plugin vagrant-hostmanager 1.8.5 ( vagrant-hostmanager) You should have already downloaded the Apache Solr 6.2.1 release ( Apache Solr 6.2.1) Scope This tutorial was tested using the following environment and components: Mac OS X 10.11.6 (El Capitan) VirtualBox 5.1.6 (tutorial should work with any newer version) Vagrant 1.8.6 vagrant-vbguest plugin 0.13.0 vagrant-hostnamanger plugin 1.8.5 Apache Solr 6.2.1 Steps Create Vagrant project directory I like to create project directories. My Vagrant work goes under ~/Vagrant/<project> and my Docker work goes under ~/Docker/<project> . This allows me to clearly identify which technology is associated with the projects and allows me to use various helper scripts to automate processes, etc. So let's create project directory for this tutorial.
mkdir -p ~/Vagrant/solrcloud-cdcr-tutorial && cd ~/Vagrant/solrcloud-cdcr-tutorial
Create Vagrantfile The Vagrantfile tells Vagrant how to configure your virtual machines. You can copy/paste my Vagrantfile below or use the version in the attachments area of this tutorial. Here is the content from my file:
# -*- mode: ruby -*-
# vi: set ft=ruby :
# Using yaml to load external configuration files
require 'yaml'
Vagrant.configure(2) do |config|
# Using the hostmanager vagrant plugin to update the host files
config.hostmanager.enabled = true
config.hostmanager.manage_host = true
config.hostmanager.manage_guest = true
config.hostmanager.ignore_private_ip = false
# Loading in the list of commands that should be run when the VM is provisioned.
commands = YAML.load_file('commands.yaml')
commands.each do |command|
config.vm.provision :shell, inline: command
end
# Loading in the VM configuration information
servers = YAML.load_file('servers.yaml')
servers.each do |servers|
config.vm.define servers[name] do |srv|
srv.vm.box = servers[box] # Speciy the name of the Vagrant box file to use
srv.vm.hostname = servers[name] # Set the hostname of the VM
srv.vm.network private_network, ip: servers[ip], :adapater=>2 # Add a second adapater with a specified IP
srv.vm.network :forwarded_port, guest: 22, host: servers[port] # Add a port forwarding rule
srv.vm.provision :shell, inline: "sed -i'' '/^127.0.0.1\t#{srv.vm.hostname}\t#{srv.vm.hostname}$/d' /etc/hosts" # Remove the extraneous first entry in /etc/hosts
srv.vm.provider :virtualbox do |vb|
vb.name = servers[name] # Name of the VM in VirtualBox
vb.cpus = servers[cpus] # How many CPUs to allocate to the VM
vb.memory = servers[ram] # How much memory to allocate to the VM
vb.customize [modifyvm, :id, --cpuexecutioncap, 25] # Limit to VM to 25% of available CPU
end
end
end
end
Create a servers.yaml file The servers.yaml file contains the configuration information for our VMs. You can copy/paste my servers.yaml below or use the version in the attachments area of this tutorial. Here is the content from my file:
---
- name: solr-dc01
box: bento/centos-7.2
cpus: 2
ram: 2048
ip: 192.168.56.101
port: 10122
- name: solr-dc02
box: bento/centos-7.2
cpus: 2
ram: 2048
ip: 192.168.56.202
port: 20222
Create commands.yaml file The commands.yaml file contains the list of commands that should be run on each VM when they are first provisioned. This allows us to automate configuration tasks that would otherwise be tedious and/or repetitive. You can copy/paste my commands.yaml below or use the version in the attachments area of this tutorial. Here is the content from my file:
- sudo yum -y install net-tools ntp wget java-1.8.0-openjdk java-1.8.0-openjdk-devel lsof
- sudo systemctl enable ntpd && sudo systemctl start ntpd
- sudo systemctl disable firewalld && sudo systemctl stop firewalld
- sudo sed -i --follow-symlinks 's/^SELINUX=.*/SELINUX=disabled/g' /etc/sysconfig/selinux
Copy Solr release file to Vagrant our project directory Our project directory is accessible to each of our Vagrant VMs via the /vagrant mount point. This allows us to easily access files and data located in our project directory. Instead of using scp to copy the Apache Solr release file to each of the VMs and creating duplicate files, we'll use a single copy located in our project directory.
cp ~/Downloads/solr-6.2.1.tgz .
NOTE: This assumes you are on a Mac and your downloads are in the ~/Downloads directory. Start virtual machines Now we are ready to start our 2 virtual machines for the first time. Creating the VMs for the first time and starting them every time after that uses the same command:
vagrant up
Once the process is complete you should have 2 servers running. You can verify by looking at VirtualBox. Notice I have 2 VMs running called solr-dc01 and solr-dc02:
Connect to each virtual machine You are able to login to each of the VMs via ssh using the vagrant ssh command. You must specify the name of the VM you want to connect to. vagrant ssh solr-dc01
Using another terminal window, repeat this process for solr-dc02 . Extract Solr install scripts The Solr release archive file contains an installation script. This installation script will do the following by default: NOTE: This assumes that you downloaded Solr 6.2.1 Install Solr under /opt/solr-6.2.1 Create a symbolic link between /opt/solr and /opt/solr-6.2.1 Create a solr user. Live data such as indexes, logs, etc are stored in /var/solr. On solr-dc01 , run the following command: tar xvfz /vagrant/solr-6.2.1.tgz solr-6.2.1/bin/install_solr_service.sh --strip-components=2
Repeat this process for solr-dc02 This will create a file called install_solr_services.sh in your current directory, which should be the /home/vagrant . Install Apache Solr Now we can install Solr using the script defaults: sudo bash ./install_solr_service.sh /vagrant/solr-6.2.1.tgz
The command above is the same as if you had specified the default settings: sudo bash ./install_solr_service.sh /vagrant/solr-6.2.1.tgz -i /opt -d /var/solr -u solr -s solr -p 8983
After running the command, you should see something similar to this: id: solr: no such user
Creating new user: solr
Extracting /vagrant/solr-6.2.1.tgz to /opt
Installing symlink /opt/solr -> /opt/solr-6.2.1 ...
Installing /etc/init.d/solr script ...
Installing /etc/default/solr.in.sh ...
Waiting up to 30 seconds to see Solr running on port 8983 [/]
Started Solr server on port 8983 (pid=29168). Happy searching!
Found 1 Solr nodes:
Solr process 29168 running on port 8983
{
solr_home:/var/solr/data,
version:6.2.1 43ab70147eb494324a1410f7a9f16a896a59bc6f - shalin - 2016-09-15 05:20:53,
startTime:2016-10-31T19:46:27.997Z,
uptime:0 days, 0 hours, 0 minutes, 12 seconds,
memory:13.4 MB (%2.7) of 490.7 MB}
Service solr installed.
If you run the following command, you can see the Solr process is running: ps -ef | grep solr
solr 28980 1 0 19:49 ? 00:00:11 java -server -Xms512m -Xmx512m -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/solr/logs/solr_gc.log -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/opt/solr/server -Dsolr.solr.home=/var/solr/data -Dsolr.install.dir=/opt/solr -Dlog4j.configuration=file:/var/solr/log4j.properties -Xss256k -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /var/solr/logs -jar start.jar --module=http
Repeat this process for solr-dc02 Modify Solr service It's more convenient to use the OS services infrastructure to manage running Solr processes than manually using scripts. The installation process creates a service script that starts Solr in single instance mode. To take advantage of CDCR, you must use SolrCloud mode. We need to make some changes to the service script for this to work. We'll be using the embedded Zookeeper instance for our tutorial. To do this, we need a zookeeper configuration file in our /var/solr/data directory. We'll copy the default configuration file from /opt/solr/server/solr/zoo.cfg . sudo -u solr cp /opt/solr/server/solr/zoo.cfg /var/solr/data/zoo.cfg
Now we need the /etc/init.d/solr service script to run Solr in SolrCloud mode. This is done by adding the -c parameter to the start process. When no other parameters are specified, Solr will start an embedded Zookeeper instance on the Solr port + 1000. In our case, that should be 9983 because our default Solr port is 8983 . Because this file is owned by root, we'll need to use sudo. exit
sudo vi /etc/init.d/solr
Look near the end of the file for the line: ...
case $1 in
start|stop|restart|status)
SOLR_CMD=$1
...
This is the section that defines the Solr command. We want to change the SOLR_CMD=$1 line to look like this SOLR_CMD=$1 -c . This will tell Solr that it should start in cloud mode. NOTE: In production, you would not use the embedded Zookeeper. You would update the /etc/defaults/solr.in.sh to set the ZK_HOST variable to the production Zookeeper instances. When this variable is set, Solr will not start the embedded Zookeeper. So the section of your file should now look like this: ...
case $1 in
start|stop|restart|status)
SOLR_CMD=$1 -c
...
Now save the file: Press the `esc` KEY
!wq Let's stop Solr: sudo service solr stop
Now we can start Solr using the new script: sudo service solr start
Once the process is started, we can check the status: sudo service solr status
Found 1 Solr nodes:
Solr process 29426 running on port 8983
{
solr_home:/var/solr/data,
version:6.2.1 43ab70147eb494324a1410f7a9f16a896a59bc6f - shalin - 2016-09-15 05:20:53,
startTime:2016-10-31T22:16:22.116Z,
uptime:0 days, 0 hours, 0 minutes, 14 seconds,
memory:30.2 MB (%6.1) of 490.7 MB,
cloud:{
ZooKeeper:localhost:9983,
liveNodes:1,
collections:0}}
As you can see, the process started successfully and there is a single cloud node running using Zookeeper on port 9983 . Repeat this process for solr-dc02 . Create Solr dc01 configuration The solr-dc01 Solr instance will be our source collection for replication. To enable CDCR we need to make a few changes to the solrconfig.xml configuration file. We'll use the data_driven_schema_configs as a base for our configuration. We need to create two different configurations because the source collection has a slightly different configuration than the target collection. On the solr-dc01 VM, copy the data_driven_schema_configs directory to the vagrant home directory. If you are following along, you should still be the vagrant user. cd /home/vagrant
cp -r /opt/solr/server/solr/configsets/data_driven_schema_configs .
Edit the solrconfig.xml file: vi data_driven_schema_configs/conf/solrconifg.xml
The first thing we are going to do is update the updateHandler definition; there is only one in the file. Find the section in the configuration file that looks like this: <updateHandler class=solr.DirectUpdateHandler2>
We are going to change the updateLog portion of the configuration. Remember that we are using vi as the text editor, so edit using the appropriate vi commands. Change this: <updateLog>
<str name=dir>${solr.ulog.dir:}</str>
<int name=numVersionBuckets>${solr.ulog.numVersionBuckets:65536}</int>
</updateLog>
to this: <updateLog class=solr.CdcrUpdateLog>
<str name=dir>${solr.ulog.dir:}</str>
<int name=numVersionBuckets>${solr.ulog.numVersionBuckets:65536}</int>
</updateLog>
Now we need to create a new requestHandler definition. Find the section in the configuration file that looks like this: <!-- A request handler that returns indented JSON by default -->
<requestHandler name=/query class=solr.SearchHandler>
<lst name=defaults>
<str name=echoParams>explicit</str>
<str name=wt>json</str>
<str name=indent>true</str>
</lst>
</requestHandler>
We are going to add our new definition just after the closing requestHandler . Add the following new definition: <!-- A request handler for cross data center replication -->
<requestHandler name=/cdcr class=solr.CdcrRequestHandler>
<lst name=replica>
<str name=zkHost>192.168.56.202:9983</str>
<str name=source>collection1</str>
<str name=target>collection1</str>
</lst>
<lst name=replicator>
<str name=threadPoolSize>8</str>
<str name=schedule>1000</str>
<str name=batchSize>128</str>
</lst>
<lst name=updateLogSynchronizer>
<str name=schedule>1000</str>
</lst>
</requestHandler>
Your updated file should now look like this: ...
<!-- A request handler that returns indented JSON by default -->
<requestHandler name=/query class=solr.SearchHandler>
<lst name=defaults>
<str name=echoParams>explicit</str>
<str name=wt>json</str>
<str name=indent>true</str>
</lst>
</requestHandler>
<!-- A request handler for cross data center replication -->
<requestHandler name=/cdcr class=solr.CdcrRequestHandler>
<lst name=replica>
<str name=zkHost>192.168.56.202:9983</str>
<str name=source>collection1</str>
<str name=target>collection1</str>
</lst>
<lst name=replicator>
<str name=threadPoolSize>8</str>
<str name=schedule>1000</str>
<str name=batchSize>128</str>
</lst>
<lst name=updateLogSynchronizer>
<str name=schedule>1000</str>
</lst>
</requestHandler>
...
NOTE: The zkHost line should have the ip address and port of the Zookeeper instance of the target collection. Our target collection is on solr-dc02 , so this ip and port are pointing to solr-dc02. When we create our collections in Solr, we'll use the name collection1 . Now save the file: Press the `esc` KEY
!wq
Create Solr dc02 configuration The solr-dc02 Solr instance will be our target collection for replication. To enable CDCR we need to make a few changes to the solrconfig.xml configuration file. As above, we'll use the data_driven_schema_configs as a base for our configuration. On solr-dc02 , copy the data_driven_schema_configs directory to the vagrant home directory. If you are following along, you should still be the vagrant user. cd /home/vagrant
cp -r /opt/solr/server/solr/configsets/data_driven_schema_configs .
Edit the solrconfig.xml file: vi data_driven_schema_configs/conf/solrconifg.xml
The first thing we are going to do is update the updateHandler definition; there is only one in the file. Find the section in the configuration file that looks like this: <updateHandler class=solr.DirectUpdateHandler2>
We are going to change the updateLog portion of the configuration. Remember that we are using vi as the text editor. Change this: <updateLog>
<str name=dir>${solr.ulog.dir:}</str>
<int name=numVersionBuckets>${solr.ulog.numVersionBuckets:65536}</int>
</updateLog>
to this: <updateLog class=solr.CdcrUpdateLog>
<str name=dir>${solr.ulog.dir:}</str>
<int name=numVersionBuckets>${solr.ulog.numVersionBuckets:65536}</int>
</updateLog>
Now we need to create a new requestHandler definition. Find the section in the configuration file that looks like this: <!-- A request handler that returns indented JSON by default -->
<requestHandler name=/query class=solr.SearchHandler>
<lst name=defaults>
<str name=echoParams>explicit</str>
<str name=wt>json</str>
<str name=indent>true</str>
</lst>
</requestHandler>
We are going to add our new definition just after the closing requestHandler . Add the following new definition: <!-- A request handler for cross data center replication -->
<requestHandler name=/cdcr class=solr.CdcrRequestHandler>
<lst name=buffer>
<str name=defaultState>disabled</str>
</lst>
</requestHandler>
<!-- A request handler for cross data center replication -->
<requestHandler name=/update class=solr.UpdateRequestHandler>
<lst name=defaults>
<str name=update.chain>cdcr-processor-chain</str>
</lst>
</requestHandler>
<updateRequestProcessorChain name=cdcr-processor-chain>
<processor class=solr.CdcrUpdateProcessorFactory/>
<processor class=solr.RunUpdateProcessorFactory/>
</updateRequestProcessorChain>
Your updated file should now look like this: ...
<!-- A request handler that returns indented JSON by default -->
<requestHandler name=/query class=solr.SearchHandler>
<lst name=defaults>
<str name=echoParams>explicit</str>
<str name=wt>json</str>
<str name=indent>true</str>
</lst>
</requestHandler>
<!-- A request handler for cross data center replication -->
<requestHandler name=/cdcr class=solr.CdcrRequestHandler>
<lst name=buffer>
<str name=defaultState>disabled</str>
</lst>
</requestHandler>
<!-- A request handler for cross data center replication -->
<requestHandler name=/update class=solr.UpdateRequestHandler>
<lst name=defaults>
<str name=update.chain>cdcr-processor-chain</str>
</lst>
</requestHandler>
<updateRequestProcessorChain name=cdcr-processor-chain>
<processor class=solr.CdcrUpdateProcessorFactory/>
<processor class=solr.RunUpdateProcessorFactory/>
</updateRequestProcessorChain>
...
Now save the file: Press the `esc` KEY
!wq
You should see how the two configurations are different between the source and target collections. Create Solr collection on solr-dc01 and solr-dc02 Now we should be able to create a collection using our update configuration. Because the two configurations are different, make sure you run this command on both the solr-dc01 and solr-dc02 VMs. This is creating the collections in our respective data centers. /opt/solr/bin/solr create -c collection1 -d ./data_driven_schema_configs
NOTE: We are using the same collection name that has CDCR enabled in the configuration. You should see something similar to this: /opt/solr/bin/solr create -c collection1 -d ./data_driven_schema_configs
Connecting to ZooKeeper at localhost:9983 ...
Uploading /home/vagrant/data_driven_schema_configs/conf for config collection1 to ZooKeeper at localhost:9983
Creating new collection 'collection1' using command:
http://localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=collection1
{
responseHeader:{
status:0,
QTime:3684},
success:{192.168.56.101:8983_solr:{
responseHeader:{
status:0,
QTime:2546},
core:collection1_shard1_replica1}}}
Now we can verify the collection exists in the Solr admin ui via: http://192.168.56.101:8983/solr/#/~cloud You should see something similar to this: As you can see, there is a single collection named collection1 which has 1 shard. You can repeat this process on solr-dc02 and see something similar. NOTE: Remember that solr-dc01 is 192.168.56.101 and solr-dc02 is 192.168.56.202. Turn on replication Let's first check the status of replication. Each of these curl commands is interacting with the collection api. You can check the status of replication using the following command: curl -XPOST 'http://192.168.56.101:8983/solr/collection1/cdcr?action=STATUS'
You should see something similar to this: curl -XPOST 'http://192.168.56.101:8983/solr/collection1/cdcr?action=STATUS'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader><int name=status>0</int><int name=QTime>5</int></lst><lst name=status><str name=process>stopped</str><str name=buffer>enabled</str></lst>
</response>
You should notice the process is displayed as stopped . We want to start the replication process. curl -XPOST 'http://192.168.56.101:8983/solr/collection1/cdcr?action=START'
You should see something similar to this: curl -XPOST 'http://192.168.56.101:8983/solr/collection1/cdcr?action=START'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader><int name=status>0</int><int name=QTime>41</int></lst><lst name=status><str name=process>started</str><str name=buffer>enabled</str></lst>
</response>
You should notice the process is now started . Now we need to disable the buffer on the target colleciton which will buffer the updates by default. curl -XPOST 'http://192.168.56.202:8983/solr/collection1/cdcr?action=DISABLEBUFFER'
You should see something similar to this: curl -XPOST 'http://192.168.56.202:8983/solr/collection1/cdcr?action=DISABLEBUFFER'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader><int name=status>0</int><int name=QTime>7</int></lst><lst name=status><str name=process>started</str><str name=buffer>disabled</str></lst>
</response>
You should notice the buffer is now disabled . Add documents to source Solr collection in solr-dc01 Now we will add a couple of sample documents to collection1 in solr-dc01. Run the following command to add 2 sample documents: curl -XPOST -H 'Content-Type: application/json' 'http://192.168.56.101:8983/solr/collection1/update' --data-binary '{
add : {
doc : {
id : 1,
text_ws : This is document number one.
}
},
add : {
doc : {
id : 2,
text_ws : This is document number two.
}
},
commit : {}
}'
You should notice the commit command in the JSON above. That is because the default solrconfig.xml does not have automatic commits enabled. You should get a response back similar to this: {responseHeader:{status:0,QTime:362}}
Query solr-dc01 collection Let's query collection1 on solr-dc01 to ensure the documents are present. Run the following command: curl -XGET 'http://192.168.56.101:8983/solr/collection1/select?q=*:*&indent=true'
You should see something similar to this: curl -XGET 'http://192.168.56.101:8983/solr/collection1/select?q=*:*&indent=true'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader>
<bool name=zkConnected>true</bool>
<int name=status>0</int>
<int name=QTime>17</int>
<lst name=params>
<str name=q>*:*</str>
<str name=indent>true</str>
</lst>
</lst>
<result name=response numFound=2 start=0>
<doc>
<str name=id>1</str>
<str name=text_ws>This is document number one.</str>
<long name=_version_>1549823582071160832</long></doc>
<doc>
<str name=id>2</str>
<str name=text_ws>This is document number two.</str>
<long name=_version_>1549823582135123968</long></doc>
</result>
</response>
Query solr-dc02 collection Before executing the query on solr-dc02 , we need to commit the changes. As mentioned above, automatic commits are not enabled in the default solrconfig.xml . Run the following command; curl -XPOST -H 'Content-Type: application/json' 'http://192.168.56.202:8983/solr/collection1/update' --data-binary '{
commit : {}
}'
You should see a response similar to this: {responseHeader:{status:0,QTime:5}}
Now we can run our query: curl -XGET 'http://192.168.56.202:8983/solr/collection1/select?q=*:*&indent=true'
You should see something similar to this: curl -XGET 'http://192.168.56.202:8983/solr/collection1/select?q=*:*&indent=true'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader>
<bool name=zkConnected>true</bool>
<int name=status>0</int>
<int name=QTime>17</int>
<lst name=params>
<str name=q>*:*</str>
<str name=indent>true</str>
</lst>
</lst>
<result name=response numFound=2 start=0>
<doc>
<str name=id>1</str>
<str name=text_ws>This is document number one.</str>
<long name=_version_>1549823582071160832</long></doc>
<doc>
<str name=id>2</str>
<str name=text_ws>This is document number two.</str>
<long name=_version_>1549823582135123968</long></doc>
</result>
</response>
You should notice that you have 2 documents, which have the same id and text_ws content as you pushed to solr-dc01. Review If you followed along with this tutorial, you have successfully set up cross data center replication between two SolrCloud configurations. Some important points to keep in mind: Because this is an active-passive approach, there is only a single source system. If the source system goes down, your ingest will stop as the other data center is read-only and should not have updates pushed outside of the replication process. Work is being done to make Solr CDCR active-active. Cross data center communications can be a potential bottleneck. If the cross data center connection can not sustain sufficient throughput, the target data center(s) can fall behind in replication. CDCR is not intended nor optimized for bulk inserts. If you have a need to do bulk inserts, first synchronize the indexes between the data centers outside of the replication process. Then enable replication for incremental updates. For more information, read about Cross Data Center Replication https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62687462
... View more
Labels:
10-25-2016
03:34 PM
1 Kudo
I reimported the virtual box VM and the default password works. However the security constraint requires you change the password.
... View more
10-14-2016
03:16 PM
Thank you for your response. That was the problem. I was using the Hortonworks documentation which does not show the version part of the URL: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_security/content/ranger_rest_api_get_policy.html
... View more