Member since
02-09-2016
559
Posts
422
Kudos Received
98
Solutions
11-10-2016
02:35 AM
2 Kudos
Objective If you are using Docker on CentOS 7, you may have run into issues importing the Docker version of the Hortonworks Sandbox for HDP 2.5. The default configuration of Docker on CentOS 7 limits your Docker virtual machine image to 10GB in size. The Docker HDP sandbox is 13GB in size. This will cause the loading process of the sandbox to fail. This tutorial will guide you through the process of installing Docker on CentOS 7 using Vagrant and modifying the configuration of Docker to move the location of and increase the size of Docker virtual machine image. While we are using Vagrant + Virtualbox, this process should work for any install of CentOS (Amazon, etc) with small changes because VirtualBox and related plugins are not needed. Prerequisites You should already have downloaded the Docker HDP 2.5 Sandbox. Read more here: Docker HDP Sandbox You should already have installed VirtualBox 5.1.x. Read more here: VirtualBox You should already have installed Vagrant 1.8.6. Read more here: Vagrant You should already have installed the vagrant-vbguest plugin. This plugin will keep the VirtualBox Guest Additions software current as you upgrade your kernel and/or VirtualBox versions. Read more here: vagrant-vbguest You should already have installed the vagrant-hostmanager plugin. This plugin will automatically manage the /etc/hosts file on your local mac and in your virtual machines. Read more here: vagrant-hostmanager Scope Mac OS X 10.11.6 (El Capitan) VirtualBox 5.1.6 Vagrant 1.8.6 vagrant-vbguest plugin 0.13.0 vagrant-hostnamanger plugin 1.8.5 Steps Create Vagrant project directory Before we get started, determine where you want to keep your Vagrant project files. Each Vagrant project should have its own directory. I keep my Vagrant projects in my ~/Development/Vagrant directory. You should also use a helpful name for each Vagrant project directory you create.
cd ~/Development/Vagrant
mkdir centos7-docker
cd centos7-docker We will be using a CentOS 7.2 Vagrant box, so I include centos7 in the Vagrant project name to differentiate it from a Centos 6 project. The project is for Docker, so I include that in the name. Thus we have a project directory name of centos7-docker. Create Vagrant project files Create Vagrantfile The Vagrantfile tells Vagrant how to configure your virtual machines. Here is my Vagrantfile:
# -*- mode: ruby -*-
# vi: set ft=ruby :
# Using yaml to load external configuration files
require 'yaml'
Vagrant.configure("2") do |config|
# Using the hostmanager vagrant plugin to update the host files
config.hostmanager.enabled = true
config.hostmanager.manage_host = true
config.hostmanager.manage_guest = true
config.hostmanager.ignore_private_ip = false
# Loading in the list of commands that should be run when the VM is provisioned.
commands = YAML.load_file('commands.yaml')
commands.each do |command|
config.vm.provision :shell, inline: command
end
# Loading in the VM configuration information
servers = YAML.load_file('servers.yaml')
servers.each do |servers|
config.vm.define servers["name"] do |srv|
srv.vm.box = servers["box"] # Speciy the name of the Vagrant box file to use
srv.vm.hostname = servers["name"] # Set the hostname of the VM
srv.vm.network "private_network", ip: servers["ip"], :adapater=>2 # Add a second adapater with a specified IP
srv.vm.network :forwarded_port, guest: 22, host: servers["port"] # Add a port forwarding rule
srv.vm.provision :shell, inline: "sed -i'' '/^127.0.0.1\\t#{srv.vm.hostname}\\t#{srv.vm.hostname}$/d' /etc/hosts"
srv.vm.provider :virtualbox do |vb|
vb.name = servers["name"] # Name of the VM in VirtualBox
vb.cpus = servers["cpus"] # How many CPUs to allocate to the VM
vb.memory = servers["ram"] # How much memory to allocate to the VM
vb.customize ["modifyvm", :id, "--cpuexecutioncap", "75"] # Limit to VM to 75% of available CPU
end
end
end
end Create a servers.yaml file The servers.yaml file contains the configuration information for our VMs. Here is the content from my file:
---
- name: docker
box: bento/centos-7.2
cpus: 6
ram: 12288
ip: 192.168.56.100
port: 10022 NOTE: My configuration uses 6 CPUs and 12GB of memory for VirtualBox. Make sure you have enough resources to use this configuration. For the purposes of the this demo, you can lower this to a 2CPU and 4GB of memory. Create commands.yaml file The commands.yaml file contains the list of commands that should be run on each VM when they are first provisioned. This allows us to automate configuration tasks that would other wise be tedious and/or repetitive. Here is the content from my file:
- "sudo yum -y install net-tools ntp wget"
- "sudo systemctl enable ntpd && sudo systemctl start ntpd"
- "sudo systemctl disable firewalld && sudo systemctl stop firewalld"
- "sudo sed -i --follow-symlinks 's/^SELINUX=.*/SELINUX=disabled/g' /etc/sysconfig/selinux" Start Virtual Machine Once you have created the 3 files in your Vagrant project directory, you are ready to start your virtual machine. Creating the vm for the first time and starting it every time after that uses the same command:
vagrant up NOTE: During the startup process, the vagrant-hostmanager will prompt you for a password. This is the sudo password for your local machine. It needs to use sudo to update the /etc/hosts file. Once the process is complete you should have 1 server running. You can verify by looking at the VirtualBox user interface; you should have a virtual machine called docker running. Connect to virtual machine You are able to login to the virtual machine via ssh using the vagrant ssh command.
vagrant ssh Install Docker in Vagrant virtual machine Now that you are connected to the virtual machine, it's time to install Docker. First we'll create the docker.repo file:
sudo tee /etc/yum.repos.d/docker.repo <<-'EOF'
[dockerrepo]
name=Docker Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/7/
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg
EOF Now we can install Docker:
sudo yum install docker-engine Now we need to enable the Docker service:
sudo systemctl enable docker.service Now we can start the Docker service:
sudo systemctl start docker Load Docker HDP sandbox We will attempt to load the HDP sandbox into Docker using the default Docker settings. This process will fail after the image has loaded approximately 10GB of data. First, make sure a copy of the Docker HDP sandbox file HDP_2.5_docker.tar.gz is located in your project directory. This will allow you to access the file via the /vagrant/ directory within your virtual machine. I'm doing this on a Mac and my file was downloaded in ~/Downloads . This command is run on your local computer:
cp ~/Downloads/HDP_2.5_docker.tar.gz ~/Development/Vagrant/centos7-docker/ Now we can try loading the sandbox image on the VM. You need to use sudo to interact with Docker. You should see something similar to the following:
sudo docker load < /vagrant/HDP_2.5_docker.tar.gz
b1b065555b8a: Loading layer [==================================================>] 202.2 MB/202.2 MB
0b547722f59f: Loading layer [=====================================> ] 10.36 GB/13.84 GB
ApplyLayer exit status 1 stdout: stderr: write /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64/jre/lib/rt.jar: no space left on device As you can see above, the load command failed and it did so a little more than 10GB into the process. That is because the Docker virtual machine image defaults to 10GB in size. We can check to see where our Docker Root is located:
sudo docker info | grep "Docker Root"
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Docker Root Dir: /var/lib/docker We can see from above the default is /var/lib/docker . Change Docker configuration CentOS 7 uses systemd to manage services. You can read more about it here systemd. The older methods of modifying Docker configuration using /etc/sysconfig based configuration files should be avoided. While you can get everything working using this approach, it's better to use the appropriate systemd methods. Some people have modified the /etc/systemd/system/multi-user.target.wants/docker.service file to make changes to the Docker configuration. This file can be overwritten during software updates, so it is best to not modify this file. Instead you should use /etc/systemd/system/docker.service.d/docker.conf . We need to create an /etc/systemd/system/docker.service.d directory. This is where our docker.conf configuration file will be located.
sudo mkdir -p /etc/systemd/system/docker.service.d Now we can edit our docker.conf file:
sudo vi /etc/systemd/system/docker.service.d/docker.conf You should add the following configuration to the file:
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --graph="/mnt/docker-data" --storage-driver=overlay --storage-opt=dm.basesize=30G The blank ExecStart= entry is not a typo. It is required to wipe the default configuration from memory. The --graph parameter specifies the location where the Docker image file should be located. Make sure this location has sufficient room for your expected data size. The --storage-driver parameter specifies the storage driver for Docker. The default is devicemapper which is limited to 10GB. Set it to overlay to allow for larger file sizes. You can read more about the storage drivers here docker storage drivers. The --storage-opt parameter allows us to change the base image size of the virtual machine. In this example I've set my base image size to 30GB. You may want to use a larger size. Restart Docker Now that we've updated our configuration, we need to restart the Docker daemon.
sudo systemctl daemon-reload
sudo systemctl restart docker We can now check to see where our Docker Root is. You should see something similar to this:
sudo docker info | grep "Docker Root"
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Docker Root Dir: /mnt/docker-data As you can see, our Docker Root has moved to /mnt/docker-data which is the location we specified with the --graph parameter in our docker.conf file. Load Docker HDP sandbox Now that we have updated our Docker configuration and restarted the daemon, we should be able to load our HDP sandbox again. Let's run the load command to see if it completed. You should see something similar to this:
sudo docker load < /vagrant/HDP_2.5_docker.tar.gz
b1b065555b8a: Loading layer [==================================================>] 202.2 MB/202.2 MB
0b547722f59f: Loading layer [==================================================>] 13.84 GB/13.84 GB
99d7327952e0: Loading layer [==================================================>] 234.8 MB/234.8 MB
294b1c0e07bd: Loading layer [==================================================>] 207.5 MB/207.5 MB
fd5c10f2f1a1: Loading layer [==================================================>] 387.6 kB/387.6 kB
6852ef70321d: Loading layer [==================================================>] 163 MB/163 MB
517f170bbf7f: Loading layer [==================================================>] 20.98 MB/20.98 MB
665edb80fc91: Loading layer [==================================================>] 337.4 kB/337.4 kB
Loaded image: sandbox:latest As you can see, the load command was successful. Review If you successfully followed along with this tutorial, we were able setup a CentOS 7 virtual machine using Vagrant. We updated the Docker configuration to use a different location for the virtual machine image and changed the base size of that image. Once these changes were complete, we were able to successfully load the HDP sandbox image. You can read more about how to install Docker on CentOS 7 here https://docs.docker.com/engine/installation/linux/centos/ and configure it here https://docs.docker.com/engine/admin/
... View more
Labels:
11-08-2016
10:48 AM
1 Kudo
Objective
Many people work exclusively from a laptop where storage space is typically limited to 500GB of space or less. Over time, you may find your available storage space has become a regular concern. It's not uncommon to use an external hard drive to augment available storage space.
The current version of Docker for Mac (1.12.x) does not provide a configuration setting which allows users to change the location where the Docker virtual machine image is located. This means the image, which can grow up to 64GB in size by default, is located on your laptop's primary hard drive.
With the HDP 2.5 version of the Hortonworks sandbox available as a native Docker image, you may find a desire to have more room available to Docker. This tutorial will guide you through the process of moving your Docker virtual machine image to a different location, an external drive in this case. This will free up to 64GB of space on your primary laptop hard drive and let you expand the size of the Docker image file later. This tutorial is the first in a two part series.
Prerequisites
You should have already completed the following tutorial Installing Docker Version of Sandbox on Mac
You should have an external or secondary hard drive available.
Scope
Mac OS X 10.11.6 (El Capitan)
Docker for Mac 1.12.1
HDP 2.5 Docker Sandbox
Steps
Stop Docker for Mac
Before we can make any changes to the Docker virtual machine image, we need to stop Docker for Mac. There should be a Docker for Mac icon in the menu bar. You should see something similar to this:
You can also check via the command line via the ps -ef | grep -i com.docker . You should see something similar to this:
ps -ef | grep -i com.docker
0 123 1 0 8:45AM ?? 0:00.01 /Library/PrivilegedHelperTools/com.docker.vmnetd
502 967 876 0 8:46AM ?? 0:00.08 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 969 967 0 8:46AM ?? 0:00.04 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 971 967 0 8:46AM ?? 0:07.96 com.docker.db --url fd:3 --git /Users/myoung/Library/Containers/com.docker.docker/Data/database
502 975 967 0 8:46AM ?? 0:03.40 com.docker.osx.hyperkit.linux
502 977 975 0 8:46AM ?? 0:00.03 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux
502 12807 967 0 9:17PM ?? 0:00.08 com.docker.osxfs --address fd:3 --connect /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --control fd:4 --volume-control fd:5 --database /Users/myoung/Library/Containers/com.docker.docker/Data/s40
502 12810 967 0 9:17PM ?? 0:00.12 com.docker.slirp --db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 --ethernet fd:3 --port fd:4 --vsock-path /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --max-connections 900
502 12811 967 0 9:17PM ?? 0:00.19 com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 12812 12811 0 9:17PM ?? 0:00.02 /Applications/Docker.app/Contents/MacOS/com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 12814 12811 0 9:17PM ?? 0:16.48 /Applications/Docker.app/Contents/MacOS/com.docker.hyperkit -A -m 12G -c 6 -u -s 0:0,hostbridge -s 31,lpc -s 2:0,virtio-vpnkit,uuid=1f629fed-1ef6-4f34-8fce-753347e3b941,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s50,macfile=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/mac.0 -s 3,virtio-blk,file:///Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2,format=qcow -s 4,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s40,tag=db -s 5,virtio-rnd -s 6,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s51,tag=port -s 7,virtio-sock,guest_cid=3,path=/Users/myoung/Library/Containers/com.docker.docker/Data,guest_forwards=2376;1525 -l com1,autopty=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty,log=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/console-ring -f kexec,/Applications/Docker.app/Contents/Resources/moby/vmlinuz64,/Applications/Docker.app/Contents/Resources/moby/initrd.img,earlyprintk=serial console=ttyS0 com.docker.driver="com.docker.driver.amd64-linux", com.docker.database="com.docker.driver.amd64-linux" ntp=gateway mobyplatform=mac -F /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/hypervisor.pid
502 13790 876 0 9:52PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 13791 13790 0 9:52PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 13793 13146 0 9:52PM ttys000 0:00.00 grep -i com.docker
Now we are going to stop Docker for Mac. Before shutting down Docker, make sure all of your containers have been stopped. Using the menu shown above, click on the Quit Docker menu option. This will stop Docker for Mac. You should notice the Docker for Mac icon is no longer visible.
Now let's confirm the Docker processes we saw before are no longer running:
ps -ef | grep -i com.docker
0 123 1 0 8:45AM ?? 0:00.01 /Library/PrivilegedHelperTools/com.docker.vmnetd
502 13815 13146 0 9:54PM ttys000 0:00.00 grep -i com.docker
NOTE: It may take a minute or two before Docker completely shuts down.
Backup Docker virtual machine image
Before we make any changes to the Docker virtual machine image, we should back it up. This will temporarily use more space on your laptop hard drive. Make sure you have enough room to hold two copies of the data. As mentioned before, the Docker image can be up to 64GB by default. Let's check the current size of our image using du -sh . The Docker image file is located at ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/ by default.
du -sh ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/
64G /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/
In my case, my image size is 64GB. You need to be sure you have room for 2 copies of the com.docker.driver.amd64-linux directory. Now we'll make a copy of our image:
cd ~/Library/Containers/com.docker.docker/Data/
cp -r com.docker.driver.amd64-linux com.docker.driver.amd64-linux.backup
This copy serves as our backup of the image.
Copy Docker virtual machine image to external drive
Now we can make a copy of our image on our external hard drive. I have a 1TB SSD mounted at /Volumes/Samsung . I am going to store my Docker virtual machine image in /Volumes/Samsung/Docker/image . You should store the image in a location that makes sense for you.
cp -r com.docker.driver.amd64-linux /Volumes/Samsung/Docker/image/
This process will take a few minutes. It will take longer if you are not using an SSD. Let's confirm the directory now exists on the external hard drive.
ls -la /Volumes/Samsung/Docker/image/
total 0
drwxr-xr-x 3 myoung staff 102 Nov 3 17:08 .
drwxr-xr-x 11 myoung staff 374 Nov 3 17:03 ..
drwxr-xr-x@ 11 myoung staff 374 Nov 7 21:53 com.docker.driver.amd64-linux
You can also check the size:
du -sh /Volumes/Samsung/Docker/image/
64G /Volumes/Samsung/Docker/image/
Create symbolic link for Docker virtual machine image
Now that we have a copy of the Docker image on the external hard drive, we will use a symbolic link from the image directory on the laptop hard drive to image directory on the external hard drive. Before creating the link, we need to remove the current image directory on our laptop hard drive
rm -rf ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux
Now let's create the symbolic link. We will use the ln -s command. The syntax for ln is ln -s <target> <source> . In this case, target is the location on the external drive and source is the location on the internal drive.
ln -s /Volumes/Samsung/Docker/image/com.docker.driver.amd64-linux ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux
We can confirm the link was created:
ls -la ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux
lrwxr-xr-x 1 myoung staff 59 Nov 3 17:05 /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux -> /Volumes/Samsung/Docker/image/com.docker.driver.amd64-linux
Restart Docker for Mac
Now we can restart Docker for Mac. This is done by running the application from the Applications folder in the Finder. You should see something similar to this:
Double-click on the Docker application to start it. You should notice the Docker for Mac icon is now back in the main menu bar. You can also check via ps -ef | grep -i com.docker . You should see something similar to this:
ps -ef | grep -i com.docker
0 123 1 0 8:45AM ?? 0:00.01 /Library/PrivilegedHelperTools/com.docker.vmnetd
502 14476 14465 0 10:42PM ?? 0:00.03 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 14479 14476 0 10:42PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 14480 14476 0 10:42PM ?? 0:00.29 com.docker.db --url fd:3 --git /Users/myoung/Library/Containers/com.docker.docker/Data/database
502 14481 14476 0 10:42PM ?? 0:00.08 com.docker.osxfs --address fd:3 --connect /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --control fd:4 --volume-control fd:5 --database /Users/myoung/Library/Containers/com.docker.docker/Data/s40
502 14482 14476 0 10:42PM ?? 0:00.04 com.docker.slirp --db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 --ethernet fd:3 --port fd:4 --vsock-path /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --max-connections 900
502 14483 14476 0 10:42PM ?? 0:00.05 com.docker.osx.hyperkit.linux
502 14484 14476 0 10:42PM ?? 0:00.08 com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 14485 14483 0 10:42PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux
502 14486 14484 0 10:42PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 14488 14484 0 10:42PM ?? 0:07.90 /Applications/Docker.app/Contents/MacOS/com.docker.hyperkit -A -m 12G -c 6 -u -s 0:0,hostbridge -s 31,lpc -s 2:0,virtio-vpnkit,uuid=1f629fed-1ef6-4f34-8fce-753347e3b941,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s50,macfile=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/mac.0 -s 3,virtio-blk,file:///Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2,format=qcow -s 4,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s40,tag=db -s 5,virtio-rnd -s 6,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s51,tag=port -s 7,virtio-sock,guest_cid=3,path=/Users/myoung/Library/Containers/com.docker.docker/Data,guest_forwards=2376;1525 -l com1,autopty=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty,log=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/console-ring -f kexec,/Applications/Docker.app/Contents/Resources/moby/vmlinuz64,/Applications/Docker.app/Contents/Resources/moby/initrd.img,earlyprintk=serial console=ttyS0 com.docker.driver="com.docker.driver.amd64-linux", com.docker.database="com.docker.driver.amd64-linux" ntp=gateway mobyplatform=mac -F /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/hypervisor.pid
502 14559 14465 0 10:46PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 14560 14559 0 10:46PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 14562 13146 0 10:46PM ttys000 0:00.00 grep -i com.docker
You should notice the Docker processes are running again. You can also check the timestamp of files in the Docker image directory on the external hard drive:
ls -la /Volumes/Samsung/Docker/image/com.docker.driver.amd64-linux
total 134133536
drwxr-xr-x@ 12 myoung staff 408 Nov 7 22:42 .
drwxr-xr-x 3 myoung staff 102 Nov 3 17:08 ..
-rw-r--r-- 1 myoung staff 68676222976 Nov 7 22:45 Docker.qcow2
-rw-r--r-- 1 myoung staff 65536 Nov 7 22:42 console-ring
-rw-r--r-- 1 myoung staff 5 Nov 7 22:42 hypervisor.pid
-rw-r--r-- 1 myoung staff 0 Aug 24 16:06 lock
drwxr-xr-x 67 myoung staff 2278 Nov 5 22:00 log
-rw-r--r-- 1 myoung staff 17 Nov 7 22:42 mac.0
-rw-r--r-- 1 myoung staff 36 Aug 24 16:06 nic1.uuid
-rw-r--r-- 1 myoung staff 5 Nov 7 22:42 pid
-rw-r--r-- 1 myoung staff 59619 Nov 7 22:42 syslog
lrwxr-xr-x 1 myoung staff 12 Nov 7 22:42 tty -> /dev/ttys001
You should notice the timestamp of the Docker.qcow2 file has been updated which means Docker is now using this location for its image file.
Start a Docker container
You should attempt to start a Docker container to make sure everything is working fine. You can start the HDP sandbox via docker start sandbox if you've already installed it as listed in the prerequisites. If everything is working fine, you can delete the backup.
Delete Docker backup image
Now that everything is working using the new location, we can remove our backup.
rm -rf ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux.backup
Review
If you successfully followed along with this tutorial, we were able to move our Docker for Mac virtual machine image to an external hard drive. This should free up to 64GB of space on your laptop hard drive. Look for part two in the series to learn how to increase the size of your Docker image.
... View more
Labels:
11-04-2016
07:24 PM
Good catch. Tutorial has been updated to provide more links.
... View more
11-04-2016
02:29 AM
4 Kudos
Objective
In many organizations "search" is a common requirement for a user friendly means of accessing data. When people thing of "search", they often think of Google. Many organizations use Solr as their enterprise search engine. It is commonly used to power public website search from within the site itself. Organizations will often build custom user interfaces to tailor queries to meet their internal or external end-user needs. In most of these scenarios, users are shielded from the complexity of the Solr query syntax.
Solr has a long list of features and capabilities; you can read more here Apache Solr Features. Solr 6 has a new feature which allows you to submit SQL queries via JDBC. This opens up new ways to interact with Solr. Using Zeppelin with SQL is now possible because of this new feature. This should make you more productive because you can use a language syntax with which you are already familiar: SQL!
This tutorial will guide you through the process of updating the Zeppelin JDBC interpreter configuration to enable submitting SQL queries to Solr via JDBC. We'll use the Hortonworks HDP 2.5 Docker sandbox and Apache Solr 6.2.1.
NOTE: Solr 6 is being deployed as a standalone application within the sandbox. HDP 2.5 ships with Solr 5.5.2 via HDPSearch which does not include the JDBC SQL functionality. Prerequisites
You should have already completed the following tutorial Installing Docker Version of Sandbox on Mac
You should have already downloaded Apache Solr 6.2.1: Apache Solr 6.2.1 Scope
Mac OS X 10.11.6 (El Capitan)
Docker for Mac 1.12.1
HDP 2.5 Docker Sandbox
Apache Solr 6.2.1 Steps Start Sandbox
If you completed the tutorial listed in the prerequisites, then you should be ready to start up your Docker sandbox container.
docker start sandbox
NOTE: If your container is still running from performing the other tutorial, you do not need to start it again.
Once the container is started, you need to login:
ssh -p 2222 root@localhost
Now you can start the services
/etc/init.d/startup_scripts start
NOTE: This process will take several minutes. Create Solr user in the sandbox
We will be running the Solr process as the solr user. Let's create that user in our sandbox:
useradd -d /home/solr -s /bin/bash -U solr
Copy Solr archive file to sandbox
You should already have the Solr archive file downloaded. We will use scp to copy the file to the sandbox. You should do this in another terminal window as your current window should be logged into the sandbox. From your Mac run the following command:
scp -P 2222 ~/Downloads/solr-6.2.1.tgz root@localhost:/root/
NOTE: The ssh and scp commands use different parameters to specify the port and it's easy to confuse them. The ssh command uses -p to specify the port. The scp command uses -P to sepcify the port.
In my case, the Solr file was downloaded to ~/Downloads . Your location may be different. Extract the Solr archive file
We'll run Solr out the /opt/ directory. This makes things a bit cleaner than using the installation script which places some files in /var .
cd /opt
tar xvfz /vagrant/solr-6.2.1.tgz
Now we need to give the solr user ownership over the directory.
chown -R solr:solr /opt/solr-6.2.1/
Install JDK 8 Solr 6.x requires JDK 8 which is not on the current version of the sandbox. You will need to install it before you can run Solr. yum install java-1.8.0-openjdk-devel Start Solr
Now that Solr is installed, we can start up a SolrCloud instance. The Solr start script provides a handy way to start a 2 node SolrCloud cluster. The -e flag tells Solr to start the cloud example. The -noprompt flag tells Solr to use default values.
cd /opt/solr-6.2.1
bin/solr start -e cloud -noprompt
Welcome to the SolrCloud example!
Starting up 2 Solr nodes for your example SolrCloud cluster.
Creating Solr home directory /opt/solr-6.2.1/example/cloud/node1/solr
Cloning /opt/solr-6.2.1/example/cloud/node1 into
/opt/solr-6.2.1/example/cloud/node2
Starting up Solr on port 8983 using command:
bin/solr start -cloud -p 8983 -s "example/cloud/node1/solr"
Waiting up to 30 seconds to see Solr running on port 8983 [\]
Started Solr server on port 8983 (pid=4952). Happy searching!
Starting up Solr on port 7574 using command:
bin/solr start -cloud -p 7574 -s "example/cloud/node2/solr" -z localhost:9983
Waiting up to 30 seconds to see Solr running on port 7574 [|]
Started Solr server on port 7574 (pid=5175). Happy searching!
Connecting to ZooKeeper at localhost:9983 ...
Uploading /opt/solr-6.2.1/server/solr/configsets/data_driven_schema_configs/conf for config gettingstarted to ZooKeeper at localhost:9983
Creating new collection 'gettingstarted' using command:
http://localhost:8983/solr/admin/collections?action=CREATE&name=gettingstarted&numShards=2&replicationFactor=2&maxShardsPerNode=2&collection.configName=gettingstarted
{
"responseHeader":{
"status":0,
"QTime":28324},
"success":{
"192.168.56.151:8983_solr":{
"responseHeader":{
"status":0,
"QTime":17801},
"core":"gettingstarted_shard1_replica1"},
"192.168.56.151:7574_solr":{
"responseHeader":{
"status":0,
"QTime":18096},
"core":"gettingstarted_shard1_replica2"}}}
Enabling auto soft-commits with maxTime 3 secs using the Config API
POSTing request to Config API: http://localhost:8983/solr/gettingstarted/config
{"set-property":{"updateHandler.autoSoftCommit.maxTime":"3000"}}
Successfully set-property updateHandler.autoSoftCommit.maxTime to 3000
SolrCloud example running, please visit: http://localhost:8983/solr
As you can see from the output, we have 2 Solr instances. One instance is listening on port 8983 and the other is listening on 7574 . They are using an embedded Zookeeper instance for coordination and it is listening on port 9983 . If we were going to production, we would the HDP cluster Zookeeper instance for more reliability. Index sample data
Now that our SolrCloud cluster is running, we can index sample data into the cluster. We'll execute our SQL queries against this data. Fortunately, Solr ships with a number of example data sets. For this tutorial index XML data which contains sample product information.
bin/post -c gettingstarted example/exampledocs/*.xml
This command posts the xml documents in the specified path. The -c option defines which collection to use. The command we used previously to create the SolrCloud cluster automatically created a gettingstarted collection using the data_driven_schema_configs configuration. This configuration is what we call schemaless because the fields are dynamically added to the collection. Without dynamic fields, you have to explicitly define every field you want to have in your collection.
You should see something like this:
bin/post -c gettingstarted example/exampledocs/*.xml
/usr/lib/jvm/java/bin/java -classpath /opt/solr-6.2.1/dist/solr-core-6.2.1.jar -Dauto=yes -Dc=gettingstarted -Ddata=files org.apache.solr.util.SimplePostTool example/exampledocs/gb18030-example.xml example/exampledocs/hd.xml example/exampledocs/ipod_other.xml example/exampledocs/ipod_video.xml example/exampledocs/manufacturers.xml example/exampledocs/mem.xml example/exampledocs/money.xml example/exampledocs/monitor2.xml example/exampledocs/monitor.xml example/exampledocs/mp500.xml example/exampledocs/sd500.xml example/exampledocs/solr.xml example/exampledocs/utf8-example.xml example/exampledocs/vidcard.xml
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/gettingstarted/update.
Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file gb18030-example.xml (application/xml) to [base]
POSTing file hd.xml (application/xml) to [base]
POSTing file ipod_other.xml (application/xml) to [base]
POSTing file ipod_video.xml (application/xml) to [base]
POSTing file manufacturers.xml (application/xml) to [base]
POSTing file mem.xml (application/xml) to [base]
POSTing file money.xml (application/xml) to [base]
POSTing file monitor2.xml (application/xml) to [base]
POSTing file monitor.xml (application/xml) to [base]
POSTing file mp500.xml (application/xml) to [base]
POSTing file sd500.xml (application/xml) to [base]
POSTing file solr.xml (application/xml) to [base]
POSTing file utf8-example.xml (application/xml) to [base]
POSTing file vidcard.xml (application/xml) to [base]
14 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/gettingstarted/update.
Time spent: 0:00:02.379
Query Solr data
Now we can use curl to run a test query against Solr. The following command will query the gettingstarted collection for all documents. It also returns the results as JSON instead of the default XML.
curl -XGET 'http://localhost:8983/solr/gettingstarted/select?q=*:*&wt=json&indent=true'
You should see something like this:
curl -XGET 'http://localhost:8983/solr/gettingstarted/select?q=*:*&wt=json&indent=true'
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":11,
"params":{
"q":"*:*",
"indent":"true",
"wt":"json"}},
"response":{"numFound":32,"start":0,"maxScore":1.0,"docs":[
{
"id":"GB18030TEST",
"name":["Test with some GB18030 encoded characters"],
"features":["No accents here",
"这是一个功能",
"This is a feature (translated)",
"这份文件是很有光泽",
"This document is very shiny (translated)"],
"price":[0.0],
"inStock":[true],
"_version_":1550023359021973504},
{
"id":"IW-02",
"name":["iPod & iPod Mini USB 2.0 Cable"],
"manu":["Belkin"],
"manu_id_s":"belkin",
"cat":["electronics",
"connector"],
"features":["car power adapter for iPod, white"],
"weight":[2.0],
"price":[11.5],
"popularity":[1],
"inStock":[false],
"store":["37.7752,-122.4232"],
"manufacturedate_dt":"2006-02-14T23:55:59Z",
"_version_":1550023359918505984},
{
"id":"MA147LL/A",
"name":["Apple 60 GB iPod with Video Playback Black"],
"manu":["Apple Computer Inc."],
"manu_id_s":"apple",
"cat":["electronics",
"music"],
"features":["iTunes, Podcasts, Audiobooks",
"Stores up to 15,000 songs, 25,000 photos, or 150 hours of video",
"2.5-inch, 320x240 color TFT LCD display with LED backlight",
"Up to 20 hours of battery life",
"Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, H.264 video",
"Notes, Calendar, Phone book, Hold button, Date display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable firmware, USB 2.0 compatibility, Playback speed control, Rechargeable capability, Battery level indication"],
"includes":["earbud headphones, USB cable"],
"weight":[5.5],
"price":[399.0],
"popularity":[10],
"inStock":[true],
"store":["37.7752,-100.0232"],
"manufacturedate_dt":"2005-10-12T08:00:00Z",
"_version_":1550023360204767232},
{
"id":"adata",
"compName_s":"A-Data Technology",
"address_s":"46221 Landing Parkway Fremont, CA 94538",
"_version_":1550023360573865984},
{
"id":"asus",
"compName_s":"ASUS Computer",
"address_s":"800 Corporate Way Fremont, CA 94539",
"_version_":1550023360584351744},
{
"id":"belkin",
"compName_s":"Belkin",
"address_s":"12045 E. Waterfront Drive Playa Vista, CA 90094",
"_version_":1550023360586448896},
{
"id":"maxtor",
"compName_s":"Maxtor Corporation",
"address_s":"920 Disc Drive Scotts Valley, CA 95066",
"_version_":1550023360587497472},
{
"id":"TWINX2048-3200PRO",
"name":["CORSAIR XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail"],
"manu":["Corsair Microsystems Inc."],
"manu_id_s":"corsair",
"cat":["electronics",
"memory"],
"features":["CAS latency 2, 2-3-3-6 timing, 2.75v, unbuffered, heat-spreader"],
"price":[185.0],
"popularity":[5],
"inStock":[true],
"store":["37.7752,-122.4232"],
"manufacturedate_dt":"2006-02-13T15:26:37Z",
"payloads":["electronics|6.0 memory|3.0"],
"_version_":1550023360602177536},
{
"id":"VS1GB400C3",
"name":["CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail"],
"manu":["Corsair Microsystems Inc."],
"manu_id_s":"corsair",
"cat":["electronics",
"memory"],
"price":[74.99],
"popularity":[7],
"inStock":[true],
"store":["37.7752,-100.0232"],
"manufacturedate_dt":"2006-02-13T15:26:37Z",
"payloads":["electronics|4.0 memory|2.0"],
"_version_":1550023360647266304},
{
"id":"VDBDB1A16",
"name":["A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM"],
"manu":["A-DATA Technology Inc."],
"manu_id_s":"corsair",
"cat":["electronics",
"memory"],
"features":["CAS latency 3, 2.7v"],
"popularity":[0],
"inStock":[true],
"store":["45.18414,-93.88141"],
"manufacturedate_dt":"2006-02-13T15:26:37Z",
"payloads":["electronics|0.9 memory|0.1"],
"_version_":1550023360648314880}]
}}
By default Solr will return the top 10 documents. If you look at the top of the results, you will notice there are 32 documents in our collection.
...
"response":{"numFound":32,"start":0,"maxScore":1.0,"docs":[
...
Modify Zeppelin JDBC interpreter
Now we need to modify the existing JDBC interpreter in Zeppelin. By default, this interpreter will work with Hive, Postgres and Phoenix. We will be adding Solr to the configuration.
Open the Zeppelin UI. You can either use the link in Ambari or directly via http://localhost:9995 . You should see something like this:
Click on the user menu in the upper right. You are logged into Zeppelin as anonymous . You should see a menu like this:
Click on the Interpreter link. You should see something like this:
You should see the jdbc interpreter near the top of the list. If you don't, you can either scroll down or use the build-in search feature at the top of the patch. You should click on the edit button for the jdbc interpreter. You will notice the screen changes to allow you to add new properties or modify existing ones. You should see something like this:
Scroll down until you see the empty entry line. You should see something like this:
We need to add 3 properities/values here.
solr.url jdbc:solr://localhost:9983?collection=gettingstarted
solr.user solr
solr.driver org.apache.solr.client.solrj.io.sql.DriverImpl
Why are we using port 9983 ? That is because we are in SolrCloud mode. We are pointing to the Zookeeper instance. If one of the nodes goes down, Zookeeper will know and direct us to a node that is working.
Add each of these properties and click the + button after each entry. You should now have 3 new properties in your list:
Now we need to add an artifact to the Dependencies section. It's just below the properties. We are going to add the following:
org.apache.solr:solr-solrj:6.2.1
Click the + button. You should see something like this:
Now click the blue Save button to save the changes. Create a new notebook
Now that we have our JDBC interpreter updated, we are going to create a new notebook. Click the Notebook drop down menu in the upper left. You should see something like this:
Click the + Create a new note link. You should see something like this:
Give the notebook the name Solr JDBC , then click the Create Note button.
You should see something like this:
We can query Solr using a prefix for jdbc like %jdbc(solr) . The prefix refers to the name of the prefix of the properties in the JDBC interpreter we setup. If you recall, there were properties like:
solr.url
phoenix.url
hive.url
psql.url
Our prefix is solr . Create the following query as the first note:
%jdbc(solr)
select name, price, inStock from gettingstarted
Now click the run arrow icon. This will run the query against Solr and return results if our configuration is correct. You should see something like this:
Now add another note below our first one with the following query:
%jdbc(solr)
select name, price, inStock from gettingstarted where inStock = false
You should see something like this:
And finally add one more note below our second one with the following query:
%jdbc(solr)
select price, count(*) from gettingstarted group by price order by price desc
You should see something like this:
As you can see it was easy to simple queries and more complex aggregations using pure SQL. For comparison, here is Solr query that does the same thing as our second note:
curl -XGET 'http://localhost:8983/solr/gettingstarted/select?fl=price,name,inStock&indent=on&q=inStock:true&wt=json'
If you ran this command in the terminal, you should see something like this: curl -XGET 'http://localhost:8983/solr/gettingstarted/select?fl=price,name,inStock&indent=on&q=inStock:true&wt=json'
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":16,
"params":{
"q":"inStock:true",
"indent":"on",
"fl":"price,name,inStock",
"wt":"json"}},
"response":{"numFound":17,"start":0,"maxScore":0.2578291,"docs":[
{
"name":["Test with some GB18030 encoded characters"],
"price":[0.0],
"inStock":[true]},
{
"name":["Apple 60 GB iPod with Video Playback Black"],
"price":[399.0],
"inStock":[true]},
{
"name":["CORSAIR XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail"],
"price":[185.0],
"inStock":[true]},
{
"name":["CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail"],
"price":[74.99],
"inStock":[true]},
{
"name":["A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM"],
"inStock":[true]},
{
"name":["One Dollar"],
"inStock":[true]},
{
"name":["One British Pound"],
"inStock":[true]},
{
"name":["Dell Widescreen UltraSharp 3007WFP"],
"price":[2199.0],
"inStock":[true]},
{
"name":["Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133"],
"price":[92.0],
"inStock":[true]},
{
"name":["Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300"],
"price":[350.0],
"inStock":[true]}]
}}
Now here is the query for the aggregations: curl -XGET 'http://localhost:8983/solr/gettingstarted/select?facet.field=price&facet=on&fl=price&indent=on&q=*:*&wt=json' Which do you find easier to use? My guess is the SQL syntax. 😉 Review
If you successfully followed along with this tutorial, we were able to install Solr and run it in SolrCloud mode. We indexed some sample xml documents. We updated our Zeppelin interpreter configuration to support Solr JDBC queries. We created a notebook and ran a few queries against Solr using SQL. And finally we saw the comparatively more complex native Solr query syntax. You can read more here:
Solr SQL: https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface Zeppelin + Solr JDBC: https://cwiki.apache.org/confluence/display/solr/Solr+JDBC+-+Apache+Zeppelin
... View more
Labels:
11-01-2016
08:55 PM
6 Kudos
Objective Cross Data Center Replication, commonly abbreviated as CDCR, is a new feature found in SolrCloud 6.x. This feature enables Solr to replicate data from one source collection to one or more target collections distributed between data centers. The current version provides an active-passive disaster recovery solution for Solr. Data updates, which include adds, updates, and deletes, are copied from the source collection to the target collection. This means the target collection should not be sent data updates outside of the CDRC functionality. Prior to SolrCloud 6.x you had to manually design a strategy for replication across data centers. This tutorial will guide you through the process of enabling CDCR between two SolrCloud clusters, each with 1 server, in a Vagrant + VirtualBox environment. NOTE: Solr 6 is being deployed as a standalone application. HDP 2.5 provides support for Solr 5.5.2 via HDPSearch which does not include CDCR functionality. Prerequisites You should have already installed the following: VirtualBox 5.1.6 (VirtualBox) Vagrant 1.8.6 (Vagrant) Vagrant plugin vagrant-vbguest 0.13.x (vagrant-vbguest) Vagrant plugin vagrant-hostmanager 1.8.5 ( vagrant-hostmanager) You should have already downloaded the Apache Solr 6.2.1 release ( Apache Solr 6.2.1) Scope This tutorial was tested using the following environment and components: Mac OS X 10.11.6 (El Capitan) VirtualBox 5.1.6 (tutorial should work with any newer version) Vagrant 1.8.6 vagrant-vbguest plugin 0.13.0 vagrant-hostnamanger plugin 1.8.5 Apache Solr 6.2.1 Steps Create Vagrant project directory I like to create project directories. My Vagrant work goes under ~/Vagrant/<project> and my Docker work goes under ~/Docker/<project> . This allows me to clearly identify which technology is associated with the projects and allows me to use various helper scripts to automate processes, etc. So let's create project directory for this tutorial.
mkdir -p ~/Vagrant/solrcloud-cdcr-tutorial && cd ~/Vagrant/solrcloud-cdcr-tutorial
Create Vagrantfile The Vagrantfile tells Vagrant how to configure your virtual machines. You can copy/paste my Vagrantfile below or use the version in the attachments area of this tutorial. Here is the content from my file:
# -*- mode: ruby -*-
# vi: set ft=ruby :
# Using yaml to load external configuration files
require 'yaml'
Vagrant.configure(2) do |config|
# Using the hostmanager vagrant plugin to update the host files
config.hostmanager.enabled = true
config.hostmanager.manage_host = true
config.hostmanager.manage_guest = true
config.hostmanager.ignore_private_ip = false
# Loading in the list of commands that should be run when the VM is provisioned.
commands = YAML.load_file('commands.yaml')
commands.each do |command|
config.vm.provision :shell, inline: command
end
# Loading in the VM configuration information
servers = YAML.load_file('servers.yaml')
servers.each do |servers|
config.vm.define servers[name] do |srv|
srv.vm.box = servers[box] # Speciy the name of the Vagrant box file to use
srv.vm.hostname = servers[name] # Set the hostname of the VM
srv.vm.network private_network, ip: servers[ip], :adapater=>2 # Add a second adapater with a specified IP
srv.vm.network :forwarded_port, guest: 22, host: servers[port] # Add a port forwarding rule
srv.vm.provision :shell, inline: "sed -i'' '/^127.0.0.1\t#{srv.vm.hostname}\t#{srv.vm.hostname}$/d' /etc/hosts" # Remove the extraneous first entry in /etc/hosts
srv.vm.provider :virtualbox do |vb|
vb.name = servers[name] # Name of the VM in VirtualBox
vb.cpus = servers[cpus] # How many CPUs to allocate to the VM
vb.memory = servers[ram] # How much memory to allocate to the VM
vb.customize [modifyvm, :id, --cpuexecutioncap, 25] # Limit to VM to 25% of available CPU
end
end
end
end
Create a servers.yaml file The servers.yaml file contains the configuration information for our VMs. You can copy/paste my servers.yaml below or use the version in the attachments area of this tutorial. Here is the content from my file:
---
- name: solr-dc01
box: bento/centos-7.2
cpus: 2
ram: 2048
ip: 192.168.56.101
port: 10122
- name: solr-dc02
box: bento/centos-7.2
cpus: 2
ram: 2048
ip: 192.168.56.202
port: 20222
Create commands.yaml file The commands.yaml file contains the list of commands that should be run on each VM when they are first provisioned. This allows us to automate configuration tasks that would otherwise be tedious and/or repetitive. You can copy/paste my commands.yaml below or use the version in the attachments area of this tutorial. Here is the content from my file:
- sudo yum -y install net-tools ntp wget java-1.8.0-openjdk java-1.8.0-openjdk-devel lsof
- sudo systemctl enable ntpd && sudo systemctl start ntpd
- sudo systemctl disable firewalld && sudo systemctl stop firewalld
- sudo sed -i --follow-symlinks 's/^SELINUX=.*/SELINUX=disabled/g' /etc/sysconfig/selinux
Copy Solr release file to Vagrant our project directory Our project directory is accessible to each of our Vagrant VMs via the /vagrant mount point. This allows us to easily access files and data located in our project directory. Instead of using scp to copy the Apache Solr release file to each of the VMs and creating duplicate files, we'll use a single copy located in our project directory.
cp ~/Downloads/solr-6.2.1.tgz .
NOTE: This assumes you are on a Mac and your downloads are in the ~/Downloads directory. Start virtual machines Now we are ready to start our 2 virtual machines for the first time. Creating the VMs for the first time and starting them every time after that uses the same command:
vagrant up
Once the process is complete you should have 2 servers running. You can verify by looking at VirtualBox. Notice I have 2 VMs running called solr-dc01 and solr-dc02:
Connect to each virtual machine You are able to login to each of the VMs via ssh using the vagrant ssh command. You must specify the name of the VM you want to connect to. vagrant ssh solr-dc01
Using another terminal window, repeat this process for solr-dc02 . Extract Solr install scripts The Solr release archive file contains an installation script. This installation script will do the following by default: NOTE: This assumes that you downloaded Solr 6.2.1 Install Solr under /opt/solr-6.2.1 Create a symbolic link between /opt/solr and /opt/solr-6.2.1 Create a solr user. Live data such as indexes, logs, etc are stored in /var/solr. On solr-dc01 , run the following command: tar xvfz /vagrant/solr-6.2.1.tgz solr-6.2.1/bin/install_solr_service.sh --strip-components=2
Repeat this process for solr-dc02 This will create a file called install_solr_services.sh in your current directory, which should be the /home/vagrant . Install Apache Solr Now we can install Solr using the script defaults: sudo bash ./install_solr_service.sh /vagrant/solr-6.2.1.tgz
The command above is the same as if you had specified the default settings: sudo bash ./install_solr_service.sh /vagrant/solr-6.2.1.tgz -i /opt -d /var/solr -u solr -s solr -p 8983
After running the command, you should see something similar to this: id: solr: no such user
Creating new user: solr
Extracting /vagrant/solr-6.2.1.tgz to /opt
Installing symlink /opt/solr -> /opt/solr-6.2.1 ...
Installing /etc/init.d/solr script ...
Installing /etc/default/solr.in.sh ...
Waiting up to 30 seconds to see Solr running on port 8983 [/]
Started Solr server on port 8983 (pid=29168). Happy searching!
Found 1 Solr nodes:
Solr process 29168 running on port 8983
{
solr_home:/var/solr/data,
version:6.2.1 43ab70147eb494324a1410f7a9f16a896a59bc6f - shalin - 2016-09-15 05:20:53,
startTime:2016-10-31T19:46:27.997Z,
uptime:0 days, 0 hours, 0 minutes, 12 seconds,
memory:13.4 MB (%2.7) of 490.7 MB}
Service solr installed.
If you run the following command, you can see the Solr process is running: ps -ef | grep solr
solr 28980 1 0 19:49 ? 00:00:11 java -server -Xms512m -Xmx512m -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/solr/logs/solr_gc.log -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/opt/solr/server -Dsolr.solr.home=/var/solr/data -Dsolr.install.dir=/opt/solr -Dlog4j.configuration=file:/var/solr/log4j.properties -Xss256k -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /var/solr/logs -jar start.jar --module=http
Repeat this process for solr-dc02 Modify Solr service It's more convenient to use the OS services infrastructure to manage running Solr processes than manually using scripts. The installation process creates a service script that starts Solr in single instance mode. To take advantage of CDCR, you must use SolrCloud mode. We need to make some changes to the service script for this to work. We'll be using the embedded Zookeeper instance for our tutorial. To do this, we need a zookeeper configuration file in our /var/solr/data directory. We'll copy the default configuration file from /opt/solr/server/solr/zoo.cfg . sudo -u solr cp /opt/solr/server/solr/zoo.cfg /var/solr/data/zoo.cfg
Now we need the /etc/init.d/solr service script to run Solr in SolrCloud mode. This is done by adding the -c parameter to the start process. When no other parameters are specified, Solr will start an embedded Zookeeper instance on the Solr port + 1000. In our case, that should be 9983 because our default Solr port is 8983 . Because this file is owned by root, we'll need to use sudo. exit
sudo vi /etc/init.d/solr
Look near the end of the file for the line: ...
case $1 in
start|stop|restart|status)
SOLR_CMD=$1
...
This is the section that defines the Solr command. We want to change the SOLR_CMD=$1 line to look like this SOLR_CMD=$1 -c . This will tell Solr that it should start in cloud mode. NOTE: In production, you would not use the embedded Zookeeper. You would update the /etc/defaults/solr.in.sh to set the ZK_HOST variable to the production Zookeeper instances. When this variable is set, Solr will not start the embedded Zookeeper. So the section of your file should now look like this: ...
case $1 in
start|stop|restart|status)
SOLR_CMD=$1 -c
...
Now save the file: Press the `esc` KEY
!wq Let's stop Solr: sudo service solr stop
Now we can start Solr using the new script: sudo service solr start
Once the process is started, we can check the status: sudo service solr status
Found 1 Solr nodes:
Solr process 29426 running on port 8983
{
solr_home:/var/solr/data,
version:6.2.1 43ab70147eb494324a1410f7a9f16a896a59bc6f - shalin - 2016-09-15 05:20:53,
startTime:2016-10-31T22:16:22.116Z,
uptime:0 days, 0 hours, 0 minutes, 14 seconds,
memory:30.2 MB (%6.1) of 490.7 MB,
cloud:{
ZooKeeper:localhost:9983,
liveNodes:1,
collections:0}}
As you can see, the process started successfully and there is a single cloud node running using Zookeeper on port 9983 . Repeat this process for solr-dc02 . Create Solr dc01 configuration The solr-dc01 Solr instance will be our source collection for replication. To enable CDCR we need to make a few changes to the solrconfig.xml configuration file. We'll use the data_driven_schema_configs as a base for our configuration. We need to create two different configurations because the source collection has a slightly different configuration than the target collection. On the solr-dc01 VM, copy the data_driven_schema_configs directory to the vagrant home directory. If you are following along, you should still be the vagrant user. cd /home/vagrant
cp -r /opt/solr/server/solr/configsets/data_driven_schema_configs .
Edit the solrconfig.xml file: vi data_driven_schema_configs/conf/solrconifg.xml
The first thing we are going to do is update the updateHandler definition; there is only one in the file. Find the section in the configuration file that looks like this: <updateHandler class=solr.DirectUpdateHandler2>
We are going to change the updateLog portion of the configuration. Remember that we are using vi as the text editor, so edit using the appropriate vi commands. Change this: <updateLog>
<str name=dir>${solr.ulog.dir:}</str>
<int name=numVersionBuckets>${solr.ulog.numVersionBuckets:65536}</int>
</updateLog>
to this: <updateLog class=solr.CdcrUpdateLog>
<str name=dir>${solr.ulog.dir:}</str>
<int name=numVersionBuckets>${solr.ulog.numVersionBuckets:65536}</int>
</updateLog>
Now we need to create a new requestHandler definition. Find the section in the configuration file that looks like this: <!-- A request handler that returns indented JSON by default -->
<requestHandler name=/query class=solr.SearchHandler>
<lst name=defaults>
<str name=echoParams>explicit</str>
<str name=wt>json</str>
<str name=indent>true</str>
</lst>
</requestHandler>
We are going to add our new definition just after the closing requestHandler . Add the following new definition: <!-- A request handler for cross data center replication -->
<requestHandler name=/cdcr class=solr.CdcrRequestHandler>
<lst name=replica>
<str name=zkHost>192.168.56.202:9983</str>
<str name=source>collection1</str>
<str name=target>collection1</str>
</lst>
<lst name=replicator>
<str name=threadPoolSize>8</str>
<str name=schedule>1000</str>
<str name=batchSize>128</str>
</lst>
<lst name=updateLogSynchronizer>
<str name=schedule>1000</str>
</lst>
</requestHandler>
Your updated file should now look like this: ...
<!-- A request handler that returns indented JSON by default -->
<requestHandler name=/query class=solr.SearchHandler>
<lst name=defaults>
<str name=echoParams>explicit</str>
<str name=wt>json</str>
<str name=indent>true</str>
</lst>
</requestHandler>
<!-- A request handler for cross data center replication -->
<requestHandler name=/cdcr class=solr.CdcrRequestHandler>
<lst name=replica>
<str name=zkHost>192.168.56.202:9983</str>
<str name=source>collection1</str>
<str name=target>collection1</str>
</lst>
<lst name=replicator>
<str name=threadPoolSize>8</str>
<str name=schedule>1000</str>
<str name=batchSize>128</str>
</lst>
<lst name=updateLogSynchronizer>
<str name=schedule>1000</str>
</lst>
</requestHandler>
...
NOTE: The zkHost line should have the ip address and port of the Zookeeper instance of the target collection. Our target collection is on solr-dc02 , so this ip and port are pointing to solr-dc02. When we create our collections in Solr, we'll use the name collection1 . Now save the file: Press the `esc` KEY
!wq
Create Solr dc02 configuration The solr-dc02 Solr instance will be our target collection for replication. To enable CDCR we need to make a few changes to the solrconfig.xml configuration file. As above, we'll use the data_driven_schema_configs as a base for our configuration. On solr-dc02 , copy the data_driven_schema_configs directory to the vagrant home directory. If you are following along, you should still be the vagrant user. cd /home/vagrant
cp -r /opt/solr/server/solr/configsets/data_driven_schema_configs .
Edit the solrconfig.xml file: vi data_driven_schema_configs/conf/solrconifg.xml
The first thing we are going to do is update the updateHandler definition; there is only one in the file. Find the section in the configuration file that looks like this: <updateHandler class=solr.DirectUpdateHandler2>
We are going to change the updateLog portion of the configuration. Remember that we are using vi as the text editor. Change this: <updateLog>
<str name=dir>${solr.ulog.dir:}</str>
<int name=numVersionBuckets>${solr.ulog.numVersionBuckets:65536}</int>
</updateLog>
to this: <updateLog class=solr.CdcrUpdateLog>
<str name=dir>${solr.ulog.dir:}</str>
<int name=numVersionBuckets>${solr.ulog.numVersionBuckets:65536}</int>
</updateLog>
Now we need to create a new requestHandler definition. Find the section in the configuration file that looks like this: <!-- A request handler that returns indented JSON by default -->
<requestHandler name=/query class=solr.SearchHandler>
<lst name=defaults>
<str name=echoParams>explicit</str>
<str name=wt>json</str>
<str name=indent>true</str>
</lst>
</requestHandler>
We are going to add our new definition just after the closing requestHandler . Add the following new definition: <!-- A request handler for cross data center replication -->
<requestHandler name=/cdcr class=solr.CdcrRequestHandler>
<lst name=buffer>
<str name=defaultState>disabled</str>
</lst>
</requestHandler>
<!-- A request handler for cross data center replication -->
<requestHandler name=/update class=solr.UpdateRequestHandler>
<lst name=defaults>
<str name=update.chain>cdcr-processor-chain</str>
</lst>
</requestHandler>
<updateRequestProcessorChain name=cdcr-processor-chain>
<processor class=solr.CdcrUpdateProcessorFactory/>
<processor class=solr.RunUpdateProcessorFactory/>
</updateRequestProcessorChain>
Your updated file should now look like this: ...
<!-- A request handler that returns indented JSON by default -->
<requestHandler name=/query class=solr.SearchHandler>
<lst name=defaults>
<str name=echoParams>explicit</str>
<str name=wt>json</str>
<str name=indent>true</str>
</lst>
</requestHandler>
<!-- A request handler for cross data center replication -->
<requestHandler name=/cdcr class=solr.CdcrRequestHandler>
<lst name=buffer>
<str name=defaultState>disabled</str>
</lst>
</requestHandler>
<!-- A request handler for cross data center replication -->
<requestHandler name=/update class=solr.UpdateRequestHandler>
<lst name=defaults>
<str name=update.chain>cdcr-processor-chain</str>
</lst>
</requestHandler>
<updateRequestProcessorChain name=cdcr-processor-chain>
<processor class=solr.CdcrUpdateProcessorFactory/>
<processor class=solr.RunUpdateProcessorFactory/>
</updateRequestProcessorChain>
...
Now save the file: Press the `esc` KEY
!wq
You should see how the two configurations are different between the source and target collections. Create Solr collection on solr-dc01 and solr-dc02 Now we should be able to create a collection using our update configuration. Because the two configurations are different, make sure you run this command on both the solr-dc01 and solr-dc02 VMs. This is creating the collections in our respective data centers. /opt/solr/bin/solr create -c collection1 -d ./data_driven_schema_configs
NOTE: We are using the same collection name that has CDCR enabled in the configuration. You should see something similar to this: /opt/solr/bin/solr create -c collection1 -d ./data_driven_schema_configs
Connecting to ZooKeeper at localhost:9983 ...
Uploading /home/vagrant/data_driven_schema_configs/conf for config collection1 to ZooKeeper at localhost:9983
Creating new collection 'collection1' using command:
http://localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=collection1
{
responseHeader:{
status:0,
QTime:3684},
success:{192.168.56.101:8983_solr:{
responseHeader:{
status:0,
QTime:2546},
core:collection1_shard1_replica1}}}
Now we can verify the collection exists in the Solr admin ui via: http://192.168.56.101:8983/solr/#/~cloud You should see something similar to this: As you can see, there is a single collection named collection1 which has 1 shard. You can repeat this process on solr-dc02 and see something similar. NOTE: Remember that solr-dc01 is 192.168.56.101 and solr-dc02 is 192.168.56.202. Turn on replication Let's first check the status of replication. Each of these curl commands is interacting with the collection api. You can check the status of replication using the following command: curl -XPOST 'http://192.168.56.101:8983/solr/collection1/cdcr?action=STATUS'
You should see something similar to this: curl -XPOST 'http://192.168.56.101:8983/solr/collection1/cdcr?action=STATUS'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader><int name=status>0</int><int name=QTime>5</int></lst><lst name=status><str name=process>stopped</str><str name=buffer>enabled</str></lst>
</response>
You should notice the process is displayed as stopped . We want to start the replication process. curl -XPOST 'http://192.168.56.101:8983/solr/collection1/cdcr?action=START'
You should see something similar to this: curl -XPOST 'http://192.168.56.101:8983/solr/collection1/cdcr?action=START'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader><int name=status>0</int><int name=QTime>41</int></lst><lst name=status><str name=process>started</str><str name=buffer>enabled</str></lst>
</response>
You should notice the process is now started . Now we need to disable the buffer on the target colleciton which will buffer the updates by default. curl -XPOST 'http://192.168.56.202:8983/solr/collection1/cdcr?action=DISABLEBUFFER'
You should see something similar to this: curl -XPOST 'http://192.168.56.202:8983/solr/collection1/cdcr?action=DISABLEBUFFER'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader><int name=status>0</int><int name=QTime>7</int></lst><lst name=status><str name=process>started</str><str name=buffer>disabled</str></lst>
</response>
You should notice the buffer is now disabled . Add documents to source Solr collection in solr-dc01 Now we will add a couple of sample documents to collection1 in solr-dc01. Run the following command to add 2 sample documents: curl -XPOST -H 'Content-Type: application/json' 'http://192.168.56.101:8983/solr/collection1/update' --data-binary '{
add : {
doc : {
id : 1,
text_ws : This is document number one.
}
},
add : {
doc : {
id : 2,
text_ws : This is document number two.
}
},
commit : {}
}'
You should notice the commit command in the JSON above. That is because the default solrconfig.xml does not have automatic commits enabled. You should get a response back similar to this: {responseHeader:{status:0,QTime:362}}
Query solr-dc01 collection Let's query collection1 on solr-dc01 to ensure the documents are present. Run the following command: curl -XGET 'http://192.168.56.101:8983/solr/collection1/select?q=*:*&indent=true'
You should see something similar to this: curl -XGET 'http://192.168.56.101:8983/solr/collection1/select?q=*:*&indent=true'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader>
<bool name=zkConnected>true</bool>
<int name=status>0</int>
<int name=QTime>17</int>
<lst name=params>
<str name=q>*:*</str>
<str name=indent>true</str>
</lst>
</lst>
<result name=response numFound=2 start=0>
<doc>
<str name=id>1</str>
<str name=text_ws>This is document number one.</str>
<long name=_version_>1549823582071160832</long></doc>
<doc>
<str name=id>2</str>
<str name=text_ws>This is document number two.</str>
<long name=_version_>1549823582135123968</long></doc>
</result>
</response>
Query solr-dc02 collection Before executing the query on solr-dc02 , we need to commit the changes. As mentioned above, automatic commits are not enabled in the default solrconfig.xml . Run the following command; curl -XPOST -H 'Content-Type: application/json' 'http://192.168.56.202:8983/solr/collection1/update' --data-binary '{
commit : {}
}'
You should see a response similar to this: {responseHeader:{status:0,QTime:5}}
Now we can run our query: curl -XGET 'http://192.168.56.202:8983/solr/collection1/select?q=*:*&indent=true'
You should see something similar to this: curl -XGET 'http://192.168.56.202:8983/solr/collection1/select?q=*:*&indent=true'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader>
<bool name=zkConnected>true</bool>
<int name=status>0</int>
<int name=QTime>17</int>
<lst name=params>
<str name=q>*:*</str>
<str name=indent>true</str>
</lst>
</lst>
<result name=response numFound=2 start=0>
<doc>
<str name=id>1</str>
<str name=text_ws>This is document number one.</str>
<long name=_version_>1549823582071160832</long></doc>
<doc>
<str name=id>2</str>
<str name=text_ws>This is document number two.</str>
<long name=_version_>1549823582135123968</long></doc>
</result>
</response>
You should notice that you have 2 documents, which have the same id and text_ws content as you pushed to solr-dc01. Review If you followed along with this tutorial, you have successfully set up cross data center replication between two SolrCloud configurations. Some important points to keep in mind: Because this is an active-passive approach, there is only a single source system. If the source system goes down, your ingest will stop as the other data center is read-only and should not have updates pushed outside of the replication process. Work is being done to make Solr CDCR active-active. Cross data center communications can be a potential bottleneck. If the cross data center connection can not sustain sufficient throughput, the target data center(s) can fall behind in replication. CDCR is not intended nor optimized for bulk inserts. If you have a need to do bulk inserts, first synchronize the indexes between the data centers outside of the replication process. Then enable replication for incremental updates. For more information, read about Cross Data Center Replication https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62687462
... View more
Labels:
10-26-2016
01:39 AM
Did you install the vagrant plugin "vagrant-hostmanager"? It is listed a requirement at the top of the tutorial.
... View more
10-08-2016
04:18 PM
4 Kudos
Objective
Given the limited resources available in a virtualized sandbox, you may choose to turn specific services on or off. You may choose to enable or disable security, such as Kerberos. Depending on your scenario, you may have a need to switch between these configurations frequently. For reproducible demos, you likely do not want to make these changes between one demo and the next. If you are like me, you may want to have different copies of HDP sandboxes to cover different demo scenarios.
With VirtualBox or VMWare sandboxes, you can easily import or clone a sandbox to have multiple, distinct copies. Each copy is unique with no sharing of configuration or data. However, this approach is not quite as intuitive when using the Docker sandbox. If you tried to create multiple containers on a Docker image thinking they would be separate copies, you likely have found they are not completely separate!
This tutorial will guide you through the process of using a single sandbox image, with multiple containers, without sharing the sandbox HDP configurations by mapping the container's /hadoop directory to distinct paths within the Docker VM.
This tutorial is a continuation of this one:
HCC Article Prerequisites
You should have already completed this tutorial: HCC Article Scope
This tutorial was tested using the following environment and components:
Mac OS X 10.11.6 HDP 2.5 on Hortonworks Sandbox (Docker Version) Docker for Mac 1.12.1 Steps Identify where container storage is located
The create container command
docker run , which was run in the previous tutorial, specifies a directory mount of -v hadoop:/hadoop . This tells Docker to create the container with a mount of /hadoop that points to the VM host location hadoop which is a relative path. We are trying to figure out where this is.
To see what storage mounts our Docker container has, we can use the
docker inspect command. If you followed my tutorial, we created the container and gave it the name sandbox .
$ docker inspect sandbox
In the output of this command you want to look for the
Mounts section. You should see something similar to this:
...
"Mounts": [
{
"Name": "hadoop",
"Source": "/var/lib/docker/volumes/hadoop/_data",
"Destination": "/hadoop",
"Driver": "local",
"Mode": "z",
"RW": true,
"Propagation": "rprivate"
}
],
...
From this output we can see that
/hadoop is pointing to /var/lib/docker/volumes/hadoop/_data . So let's see what's in that location.
$ ls /var/lib/docker/volumes/hadoop/_data
ls: /var/lib/docker/volumes/hadoop/_data: No such file or directory
The directory doesn't exist. Why is this? The latest version of Docker for Mac is uing the Hyperkit (
Hyperkit) as the virtualization layer. Previous versions used VirtualBox as the virtualization layer. Both versions use a common VM to run all of the containers. So the Source path is not on the Mac itself, rather it is on the VM.
So let's connect to the Docker VM to see if the directory exists there. The following command will start a temporary container based on an Alpine Linux image that mounts the Docker VMs root directory as
/vm-root and then does an ls -latr on it.
$ docker run --rm -it -v /:/vm-root alpine:edge ls -latr /vm-root/var/lib/docker/volumes/
You should see something similar to this:
$ docker run --rm -it -v /:/vm-root alpine:edge ls -latr /vm-root/var/lib/docker/volumes/
total 88
drwx--x--x 10 root root 4096 Aug 24 20:07 ..
drwxr-xr-x 3 root root 4096 Sep 19 21:25 9ab350e3947fc409819cc0924401d863fe84f5c45ea4243bcecf3e91a0741068
drwxr-xr-x 3 root root 4096 Sep 20 15:51 330351a101d34c3f0ed4f4ee7c3ef4277754a2cadd68d711e8e871aa09280e39
drwxr-xr-x 3 root root 4096 Sep 25 18:03 hadoop
drwxr-xr-x 3 root root 4096 Sep 28 21:13 ae64ecf489ceac45866a35b3babdf4773f67ba555acc5d45b1d52f9f305a964f
drwxr-xr-x 3 root root 4096 Sep 28 23:03 088a11867381704183ac9116ad3da0513c03885665e9e03049432363d2884d1e
drwxr-xr-x 3 root root 4096 Sep 28 23:17 f6f28886b2f50f72c52081dc2e9339678b9ecf4910564e14531c3ca6c8791974
drwxr-xr-x 3 root root 4096 Oct 5 13:45 c6825d9c9c6933549a446bf45924db641b65a632c18da662b15a109dc46b5f15
drwxr-xr-x 3 root root 4096 Oct 5 13:48 6ea352c744531d4c53e699df5eafde40100e4935c7398917714ed33ee7fe5f73
drwxr-xr-x 3 root root 4096 Oct 5 13:49 151490435ffcd759c266049b24cf3a18759c5fd3e26f1a05357973e318a8b117
drwxr-xr-x 3 root root 4096 Oct 5 13:50 a0575116e211d35d94ee648822a1bf035c708f90bf7e9620061753a3f34be150
-rw------- 1 root root 65536 Oct 7 18:46 metadata.db
drwx------ 14 root root 4096 Oct 7 18:46 .
Your output will not look exactly the same. The container ids listed will be different and you may not have the same number of containers. However, you should see the
hadoop directory in your output. Let's take a quick look inside it by modifying our previouis Docker command:
$ docker run --rm -it -v /:/vm-root alpine:edge ls -latr /vm-root/var/lib/docker/volumes/hadoop/_data
You should see something similar to this:
$ docker run --rm -it -v /:/vm-root alpine:edge ls -latr /vm-root/var/lib/docker/volumes/hadoop/_data
total 36
drwxr-xr-x 3 516 501 4096 Sep 13 10:54 zookeeper
drwxr-xr-x 3 513 501 4096 Sep 13 10:56 mapreduce
drwxr-xr-x 5 506 501 4096 Sep 13 10:56 hdfs
drwxr-xr-x 5 520 501 4096 Sep 13 10:58 yarn
drwxr-xr-x 3 506 501 4096 Sep 13 10:59 oozie
drwxr-xr-x 5 518 501 4096 Sep 13 11:02 falcon
drwxr-xr-x 3 root root 4096 Sep 25 18:03 ..
drwxr-xr-x 9 506 501 4096 Sep 28 20:36 .
drwxr-xr-x 7 510 501 4096 Oct 5 21:37 storm
As you can see, this where container is storing the data for the
/hadoop mount. The problem with this is that mount is the same for every container that runs that image using the run command we provided before.
We are going to modify how we create our containers so they each have a separate /hadoop mount. Create a new project directory
I like to create project directories. My Vagrant work goes under
~/Vagrant/<project> and my Docker work goes under ~/Docker/<project> . This allows me to cleary identify which technology or tool is associated with the projects and allows me to use various helper scripts to automate processes, etc. So let's create project directory for an notional Atlas demo.
$ mkdir -p ~/Docker/atlas-demo1 && cd ~/Docker/atlas-demo1
Create the project helper files
To make it easy to switch between containers and projects, I like to create 4 helper scripts. You can copy/paste the scripts as described below, or you can download them from the attachments section of this article.
create-container.sh
The first script is used to create the container: create-container.sh. In this script we'll be using a similar
docker run command as used in the previous tutorial. However, we are going to modify the mounts so they are no longer shared. The key change is we are doing grab the basename of our current project directory and use that name as our mount point instead of the "hard coded" hadoop.
We are also using the
basename of our project directory for the
--name of the container. In this case, the basename is atlas-demo1 . The last change you should notice is we have added a second -v flag. This addition mounts our local project directory to /mount within the container. This makes it really easy to copy data back and forth between our local directory and the container.
Edit the create-container.sh file
vi create-container.sh .
Copy and paste the following into your file:
#!/bin/bash
export CUR_DIR=`pwd`
export PROJ_DIR=`basename $CUR_DIR`
docker run -v `pwd`:/mount -v ${PROJ_DIR}:/hadoop --name ${PROJ_DIR} --hostname "sandbox.hortonworks.com" --privileged -d -p 6080:6080 -p 9090:9090 -p 9000:9000 -p 8000:8000 -p 8020:8020 -p 42111:42111 -p 10500:10500 -p 16030:16030 -p 8042:8042 -p 8040:8040 -p 2100:2100 -p 4200:4200 -p 4040:4040 -p 8050:8050 -p 9996:9996 -p 9995:9995 -p 8080:8080 -p 8088:8088 -p 8886:8886 -p 8889:8889 -p 8443:8443 -p 8744:8744 -p 8888:8888 -p 8188:8188 -p 8983:8983 -p 1000:1000 -p 1100:1100 -p 11000:11000 -p 10001:10001 -p 15000:15000 -p 10000:10000 -p 8993:8993 -p 1988:1988 -p 5007:5007 -p 50070:50070 -p 19888:19888 -p 16010:16010 -p 50111:50111 -p 50075:50075 -p 50095:50095 -p 18080:18080 -p 60000:60000 -p 8090:8090 -p 8091:8091 -p 8005:8005 -p 8086:8086 -p 8082:8082 -p 60080:60080 -p 8765:8765 -p 5011:5011 -p 6001:6001 -p 6003:6003 -p 6008:6008 -p 1220:1220 -p 21000:21000 -p 6188:6188 -p 61888:61888 -p 2181:2181 -p 2222:22 sandbox /usr/sbin/sshd -D
Now save your file with
:wq!
start-container.sh
The second script is used to start the container after it has been created. You start a container by using the
docker start <container> command where container is either the name or id. Instead of having to remember what the container name is, we'll have the script figure that out for us.
Edit the start-container.sh file
vi start-container.sh .
Copy and paste the following into your file:
#!/bin/bash
export CUR_DIR=`pwd`
export PROJ_DIR=`basename $CUR_DIR`
docker start ${PROJ_DIR}
Now save your file with
:wq!
stop-container.sh
The third script is used to stop the container after it has been created. You stop a container by using the
docker stop <container> command where container is either the name or id. Instead of having to remember what the container name is, we'll have the script figure that out for us.
Edit the stop-container.sh file
vi stop-container.sh .
Copy and paste the following into your file:
#!/bin/bash
export CUR_DIR=`pwd`
export PROJ_DIR=`basename $CUR_DIR`
docker stop ${PROJ_DIR}
Now save your file with
:wq! ssh-container.sh
The fourth script is used to ssh into the container. The container maps the local host port
2222 to the container port 22 via the -p 2222:22 line in the create-container.sh script. Admittedly the ssh command to connect is simple. However this script means I don't have to think about it very much. Edit the ssh-container.sh file vi ssh-container.sh .
Copy and paste the following into your file:
#!/bin/bash
ssh -p 2222 root@localhost
Now save your file with
:wq! Create the atlas-demo1 container
Now that we have our helper scripts ready to go, let's create the container for our notional Atlas demo.
$ cd ~/Docker/atlas-demo1
$ ./create-container.sh
You should see something similar to the following:
$ ./create-container.sh
9366e0b23a72ea53581647e174b50e5d24ec08a217c1bf3591491ad74ab18028
The output of the docker run command is the unique container id for our
atlas-demo1 container. You can verify the container is running with the docker ps command:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9366e0b23a72 sandbox "/usr/sbin/sshd -D" 55 seconds ago Up 53 seconds 0.0.0.0:1000->1000/tcp, 0.0.0.0:1100->1100/tcp, 0.0.0.0:1220->1220/tcp, 0.0.0.0:1988->1988/tcp, 0.0.0.0:2100->2100/tcp, 0.0.0.0:2181->2181/tcp, 0.0.0.0:4040->4040/tcp, 0.0.0.0:4200->4200/tcp, 0.0.0.0:5007->5007/tcp, 0.0.0.0:5011->5011/tcp, 0.0.0.0:6001->6001/tcp, 0.0.0.0:6003->6003/tcp, 0.0.0.0:6008->6008/tcp, 0.0.0.0:6080->6080/tcp, 0.0.0.0:6188->6188/tcp, 0.0.0.0:8000->8000/tcp, 0.0.0.0:8005->8005/tcp, 0.0.0.0:8020->8020/tcp, 0.0.0.0:8040->8040/tcp, 0.0.0.0:8042->8042/tcp, 0.0.0.0:8050->8050/tcp, 0.0.0.0:8080->8080/tcp, 0.0.0.0:8082->8082/tcp, 0.0.0.0:8086->8086/tcp, 0.0.0.0:8088->8088/tcp, 0.0.0.0:8090-8091->8090-8091/tcp, 0.0.0.0:8188->8188/tcp, 0.0.0.0:8443->8443/tcp, 0.0.0.0:8744->8744/tcp, 0.0.0.0:8765->8765/tcp, 0.0.0.0:8886->8886/tcp, 0.0.0.0:8888-8889->8888-8889/tcp, 0.0.0.0:8983->8983/tcp, 0.0.0.0:8993->8993/tcp, 0.0.0.0:9000->9000/tcp, 0.0.0.0:9090->9090/tcp, 0.0.0.0:9995-9996->9995-9996/tcp, 0.0.0.0:10000-10001->10000-10001/tcp, 0.0.0.0:10500->10500/tcp, 0.0.0.0:11000->11000/tcp, 0.0.0.0:15000->15000/tcp, 0.0.0.0:16010->16010/tcp, 0.0.0.0:16030->16030/tcp, 0.0.0.0:18080->18080/tcp, 0.0.0.0:19888->19888/tcp, 0.0.0.0:21000->21000/tcp, 0.0.0.0:42111->42111/tcp, 0.0.0.0:50070->50070/tcp, 0.0.0.0:50075->50075/tcp, 0.0.0.0:50095->50095/tcp, 0.0.0.0:50111->50111/tcp, 0.0.0.0:60000->60000/tcp, 0.0.0.0:60080->60080/tcp, 0.0.0.0:61888->61888/tcp, 0.0.0.0:2222->22/tcp atlas-demo1
You should notice the shortened version of the container id is listed as
9366e0b23a72 . It is the first 12 charactrers, and it matches the output of our create-container.sh command. Your container id value will be different. You should also notice the name of the container is listed as atlas-demo1 .
When you create a container with
docker run it starts it for you. That means you can connect to it without having to run the start-container.sh script. After the container has been stopped, you will need to run start-container.sh to bring it up, NOT create-container.sh . Connect to the atlas-demo1 container
Now that the container is started, we can connect to it. We can use our new helper script
ssh-container.sh to make it easy:
$ ./ssh-container.sh
You should be prompted for a password. The default password on the sandbox is
hadoop . The first time you start log into a new container you will be prompted to change the password. You should see something similar to this:
$ ./ssh-container.sh
root@localhost's password:
You are required to change your password immediately (root enforced)
Last login: Thu Sep 22 11:35:09 2016 from 172.17.0.1
Changing password for root.
(current) UNIX password:
New password:
Retype new password:
For demo purposes, I temporarily change it something new like
trymenow and then change it back to hadoop .
[root@sandbox ~]# passwd
Changing password for user root.
New password:
BAD PASSWORD: is too simple
Retype new password:
passwd: all authentication tokens updated successfully.
Verify container mounts
Let's verify our container mounts. You do this with the
df command:
[root@sandbox ~]# df -h
Filesystem Size Used Avail Use% Mounted on
none 60G 32G 25G 57% /
tmpfs 5.9G 0 5.9G 0% /dev
tmpfs 5.9G 0 5.9G 0% /sys/fs/cgroup
/dev/vda2 60G 32G 25G 57% /hadoop
/dev/vda2 60G 32G 25G 57% /etc/resolv.conf
/dev/vda2 60G 32G 25G 57% /etc/hostname
/dev/vda2 60G 32G 25G 57% /etc/hosts
shm 64M 8.0K 64M 1% /dev/shm
osxfs 233T 33T 201T 15% /Users/myoung/Documents/Docker/atlas-demo1
The first thing you should notice is the last entry. My local project directory is mounted as
osxfs . Let's ls the /mount directory to see what's there:
[root@sandbox ~]# ls -la /Users/myoung/Documents/Docker/atlas-demo1
total 300
drwxr-xr-x 12 root root 408 Oct 7 22:52 .
drwxr-xr-x 3 root root 4096 Oct 7 22:57 ..
-rwxrwxr-x 1 root root 1199 Oct 7 23:31 create-container.sh
-rwxrwxr-x 1 root root 40 Oct 7 22:52 ssh-container.sh
-rwxrwxr-x 1 root root 96 Oct 7 22:48 start-container.sh
-rwxrwxr-x 1 root root 95 Oct 7 22:48 stop-container.sh
You should see the 4 helper scripts we created. If I want to easily make data available to the container, all I have to do is copy the data to my project directory. Start the sandbox processes
When the container starts up, it doesn't automatically start the sandbox processes. You can do that by running the
/etc/inid./startup_script . You should see something similar to this:
[root@sandbox ~]# /etc/init.d/startup_script start
Starting tutorials... [ Ok ]
Starting startup_script...
Starting HDP ...
Starting mysql [ OK ]
Starting Flume [ OK ]
Starting Postgre SQL [ OK ]
Starting Ranger-admin [WARNINGS]
find: failed to restore initial working directory: Permission denied
Starting data node [ OK ]
Starting name node [ OK ]
Safe mode is OFF
Starting Oozie [ OK ]
Starting Ranger-usersync [ OK ]
Starting Zookeeper nodes [ OK ]
Starting NFS portmap [ OK ]
Starting Hdfs nfs [ OK ]
Starting Hive server [ OK ]
Starting Hiveserver2 [ OK ]
Starting Ambari server [ OK ]
Starting Ambari agent [ OK ]
Starting Node manager [ OK ]
Starting Yarn history server [ OK ]
Starting Webhcat server [ OK ]
Starting Spark [ OK ]
Starting Mapred history server [ OK ]
Starting Zeppelin [ OK ]
Starting Resource manager [ OK ]
Safe mode is OFF
Starting sandbox...
/etc/init.d/startup_script: line 97: /proc/sys/kernel/hung_task_timeout_secs: No such file or directory
Starting shellinaboxd: [ OK ]
NOTE: You can ignore any warnings or errors that are displayed.
Now the sandbox processes are running and you can access the Ambari interface via
http://localhost:8080 . Log in with the raj_ops username and password. You should see something similar to this:
Enable HBase
We are going to start the HBase service and turn off maintenance mode. We want to compare this sandbox with another one we will start later to show the services are different.
Click on the HBase service. The HBase summary page will be displayed. Click the Services button and select the
Start menu option. You should see something simiarl to this:
A confirmation dialog will be displayed. Check the
Turn Off Maintenance Mode for HBase and then click the green Confirm Start button.
The Background Operation Running dialog will be displayed. You should see something similar to this:
You can click the green
OK button.
Once HBase is running, you should see something similar to this:
You should notice that HBase is running and is no longer in maintenance mode. Upload file to HDFS home directory
We are going to upload a file to the user home directory on HDFS. As mentioned in the previous section, we want to compare this sandbox with another to show the directories are different.
Click on the Ambari Views menu in the upper right menu. A drop down menu will be displayed. You should see something similar to this:
Click on the
Files View option. You should see something similar to this:
We are going to navigate to our user home directory. We are logged in as
raj_ops . So click on the user folder, then the raj_ops folder. You should see something similar to this:
Now we are going to upload a file. Click on the blue
Upload button. You should see something similar to this:
Click the cloud-arrow icon. You should see a file dialog box that looks simlar to this:
You should be in your project directory. If you are not, nagivate it that location until you see the project helper files we create. We are going to upload the start-container.sh script. Select the file and then click the
open button. You should see something similar to this:
Stop the atlas-demo1 container
Now we are going to stop our container. Before stopping it, use Ambari to
Stop All services. You can find that link on the Ambari Dashboard:
You stop your container by running the
stop-container.sh script on the local host machine.
[root@sandbox ~]# exit
logout
Connection to localhost closed.
$ ./stop-container.sh
atlas-demo1
When you stop or start a container, Docker will always print the name of the container when it the command completes. Create the atlas-demo2 container
Now let's create a new project directory for comparison. This will show that our two containers are not sharing configurations.
$ mkdir ~/Docker/atlas-demo2 && cd ~/Docker/atlas-demo2
Copy helper scripts
There is no reason to copy/paste those helper scripts again. The scripts we created will work anywhere. So let's copy them.
$ cp ~/Docker/atlas-demo1/* .
$ ls
create-container.sh ssh-container.sh start-container.sh stop-container.sh
Create the atlas-demo2 container
This is a new container, so we need to run the
create-container.sh script.
You should see something similar to the following:
$ ./create-container.sh
05e4710f3aaa1232b620a5d908003070a7b3d991c064ac09c04571a2fc1b2079
The output of the docker run command is the unique container id for our
atlas-demo2 container. You can verify the container is running with the docker ps command:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
05e4710f3aaa sandbox "/usr/sbin/sshd -D" About a minute ago Up 33 seconds 0.0.0.0:1000->1000/tcp, 0.0.0.0:1100->1100/tcp, 0.0.0.0:1220->1220/tcp, 0.0.0.0:1988->1988/tcp, 0.0.0.0:2100->2100/tcp, 0.0.0.0:2181->2181/tcp, 0.0.0.0:4040->4040/tcp, 0.0.0.0:4200->4200/tcp, 0.0.0.0:5007->5007/tcp, 0.0.0.0:5011->5011/tcp, 0.0.0.0:6001->6001/tcp, 0.0.0.0:6003->6003/tcp, 0.0.0.0:6008->6008/tcp, 0.0.0.0:6080->6080/tcp, 0.0.0.0:6188->6188/tcp, 0.0.0.0:8000->8000/tcp, 0.0.0.0:8005->8005/tcp, 0.0.0.0:8020->8020/tcp, 0.0.0.0:8040->8040/tcp, 0.0.0.0:8042->8042/tcp, 0.0.0.0:8050->8050/tcp, 0.0.0.0:8080->8080/tcp, 0.0.0.0:8082->8082/tcp, 0.0.0.0:8086->8086/tcp, 0.0.0.0:8088->8088/tcp, 0.0.0.0:8090-8091->8090-8091/tcp, 0.0.0.0:8188->8188/tcp, 0.0.0.0:8443->8443/tcp, 0.0.0.0:8744->8744/tcp, 0.0.0.0:8765->8765/tcp, 0.0.0.0:8886->8886/tcp, 0.0.0.0:8888-8889->8888-8889/tcp, 0.0.0.0:8983->8983/tcp, 0.0.0.0:8993->8993/tcp, 0.0.0.0:9000->9000/tcp, 0.0.0.0:9090->9090/tcp, 0.0.0.0:9995-9996->9995-9996/tcp, 0.0.0.0:10000-10001->10000-10001/tcp, 0.0.0.0:10500->10500/tcp, 0.0.0.0:11000->11000/tcp, 0.0.0.0:15000->15000/tcp, 0.0.0.0:16010->16010/tcp, 0.0.0.0:16030->16030/tcp, 0.0.0.0:18080->18080/tcp, 0.0.0.0:19888->19888/tcp, 0.0.0.0:21000->21000/tcp, 0.0.0.0:42111->42111/tcp, 0.0.0.0:50070->50070/tcp, 0.0.0.0:50075->50075/tcp, 0.0.0.0:50095->50095/tcp, 0.0.0.0:50111->50111/tcp, 0.0.0.0:60000->60000/tcp, 0.0.0.0:60080->60080/tcp, 0.0.0.0:61888->61888/tcp, 0.0.0.0:2222->22/tcp atlas-demo2
You should notice the shortened version of the container id is listed as
05e4710f3aaa . As before, this id matches the first 12 charactrers, and it matches the output of our create-container.sh command. Your container id value will be different. You should also notice the name of the container is listed as atlas-demo2 . Connect to the atlas-demo2 container
Now that the container is started, we can connect to it. We can use our new helper script
ssh-container.sh to make it easy:
$ ./ssh-container.sh
Because this is a new container, you should be prompted for a password. Change the password as you did with
atlas-demo1 . Verify container mounts
Let's verify our container mounts. You do this with the
df command:
[root@sandbox ~]# df -h
Filesystem Size Used Avail Use% Mounted on
none 60G 32G 25G 57% /
tmpfs 5.9G 0 5.9G 0% /dev
tmpfs 5.9G 0 5.9G 0% /sys/fs/cgroup
/dev/vda2 60G 32G 25G 57% /hadoop
/dev/vda2 60G 32G 25G 57% /etc/resolv.conf
/dev/vda2 60G 32G 25G 57% /etc/hostname
/dev/vda2 60G 32G 25G 57% /etc/hosts
shm 64M 8.0K 64M 1% /dev/shm
osxfs 233T 33T 201T 15% /Users/myoung/Documents/Docker/atlas-demo1
The first thing you should notice is the last entry. My local project directory is mounted as
osxfs . Let's ls the /mount directory to see what's there:
[root@sandbox ~]# ls -la /Users/myoung/Documents/Docker/atlas-demo2
total 300
drwxr-xr-x 12 root root 408 Oct 7 22:52 .
drwxr-xr-x 3 root root 4096 Oct 7 22:57 ..
-rwxrwxr-x 1 root root 1199 Oct 7 23:31 create-container.sh
-rwxrwxr-x 1 root root 40 Oct 7 22:52 ssh-container.sh
-rwxrwxr-x 1 root root 96 Oct 7 22:48 start-container.sh
-rwxrwxr-x 1 root root 95 Oct 7 22:48 stop-container.sh
As before, you should see the 4 helper scripts we created. Start the sandbox processes
When the container starts up, it doesn't automatically start the sandbox processes. You can do that by running the
/etc/inid./startup_script . You should see something similar to this:
[root@sandbox ~]# /etc/init.d/startup_script start
Starting tutorials... [ Ok ]
Starting startup_script...
Starting HDP ...
Starting mysql [ OK ]
Starting Flume [ OK ]
Starting Postgre SQL [ OK ]
Starting Ranger-admin [WARNINGS]
find: failed to restore initial working directory: Permission denied
Starting data node [ OK ]
Starting name node [ OK ]
Safe mode is OFF
Starting Oozie [ OK ]
Starting Ranger-usersync [ OK ]
Starting Zookeeper nodes [ OK ]
Starting NFS portmap [ OK ]
Starting Hdfs nfs [ OK ]
Starting Hive server [ OK ]
Starting Hiveserver2 [ OK ]
Starting Ambari server [ OK ]
Starting Ambari agent [ OK ]
Starting Node manager [ OK ]
Starting Yarn history server [ OK ]
Starting Webhcat server [ OK ]
Starting Spark [ OK ]
Starting Mapred history server [ OK ]
Starting Zeppelin [ OK ]
Starting Resource manager [ OK ]
Safe mode is OFF
Starting sandbox...
/etc/init.d/startup_script: line 97: /proc/sys/kernel/hung_task_timeout_secs: No such file or directory
Starting shellinaboxd: [ OK ]
NOTE: You can ignore any warnings or errors that are displayed. Check Ambari Services
We are going to look at the services in Ambari. In the old container we turned off maintenance mode. Login with the
raj_ops username and password.
You should see something similar to this:
You should notice that the HBase service has maintenance mode turn on. Check HDFS home directory
Now nagivate the
raj_ops HDFS home directory using the Ambari Files View. Follow the process above up to get to the home directory. You should see something similar to this:
Notice the file we uploaded in the other container is not here. Stop the atlas-demo2 container
Now we are going to stop our container. Before stopping it, use Ambari to
Stop All services as you did before. Then you run the stop-container.sh script:
You stop your container by running the
stop-container.sh script on the local host machine.
[root@sandbox ~]# exit
logout
Connection to localhost closed.
$ ./stop-container.sh
atlas-demo2
Starting created containers
As mentioned above, the create process will autostart the containers. After you stop them, you need to run the
start-container.sh script, which simply runs docker start <container> .
$ ./start-container.sh
atlas-demo2
Again, the Docker start command will print the name of the container when it completes. Deleting containers
If you decide you no longer need a container, you can easily delete it. Before you ca delete the container, you need to stop it first. Once it is stopped, you us the
docker rm command:
$ docker rm atlas-demo1
atlas-demo1
As with the start and stop command, the
rm command will print the name of the container when the command completes.
If the container is not running, the docker command will display the following:
$ docker stop atlas-demo1
Error response from daemon: No such container: atlas-demo1
That means the container is already stopped and can be deleted Note on disk utilization
While the containers do not share configurations, they all run on the same Docker virtual machine. This means that you should properly manage the number of containers you are using as the storage space of the VM will become an issue.
Here is a quick screenshot of my disk usage in Ambari: hdfs-1.png hdfs-2.png
Let's see what your disk usage looks like at the command line:
$ docker run --rm -it -v /:/vm-root alpine:edge df -h /
Filesystem Size Used Available Use% Mounted on
none 59.0G 33.8G 22.2G 60% /
I'm going to delete the two atlas demo containers to see if that changes my disk utilization.
$ docker rm atlas-demo1
atlas-demo1
$ docker rm atlas-demo2
atlas-demo2
Now let's look at my disk utilization:
$ docker run --rm -it -v /:/vm-root alpine:edge df -h /
Filesystem Size Used Available Use% Mounted on
none 59.0G 33.1G 22.9G 59% /
It looks like I freed up about 600MB of space. As you add and remove containers, just be sure to keep an eye on your overall disk utilization. The space reported by HDFS in Ambari for your sandbox containers should closely reflect the VM disk space as seen here:
Review
If you successfully followed along with this tutorial, you now have an easy way to create HDP Docker based sandboxes that don't share configuration. You have a few scripts to make the management process easier. You can read more about Docker container storage here: Docker Volumes
... View more
10-08-2016
04:17 PM
@Saptak Sen Thank you for the feedback. I didn't realize that you could load the .tar.gz file directly.
... View more
10-08-2016
04:17 PM
19 Kudos
Objective This tutorial walks you through the process of installing the Docker version of the HDP 2.5 Hortonworks Sandbox on a Mac. This tutorial is part one of a two part series. The second article can be found here:HCC Article Prerequisites You should already have installed Docker for Mac. (Read more here Docker for Mac) You should already have downloaded the Docker version of the Hortonworks Sandbox (Read more here Hortonworks Sandbox) Scope This tutorial was tested using the following environment and components: Mac OS X 10.11.6 HDP 2.5 on Hortonworks Sandbox (Docker Version) Docker for Mac 1.12.1 NOTE: You should adjust your Docker configuration to provide at least 8GB of RAM. I personally find things are better with 10-12GB of RAM. You can follow this article for more information: https://hortonworks.com/tutorial/sandbox-deployment-and-install-guide/section/3/#for-mac Steps 1. Ensure the Docker daemon is running. You can verify by typing:
$ docker images You should see something similar to this:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE If your Docker daemon is not running, you may see the following:
$ docker images
Error response from daemon: Bad response from Docker engine 2. Load the Hortonworks sandbox image into Docker: $ docker load < HDP_2.5_docker.tar.gz You should see something similar to this:
$ docker load < HDP_2.5_docker.tar.gz
b1b065555b8a: Loading layer [==================================================>] 202.2 MB/202.2 MB
0b547722f59f: Loading layer [==================================================>] 13.84 GB/13.84 GB
99d7327952e0: Loading layer [==================================================>] 234.8 MB/234.8 MB
294b1c0e07bd: Loading layer [==================================================>] 207.5 MB/207.5 MB
fd5c10f2f1a1: Loading layer [==================================================>] 387.6 kB/387.6 kB
6852ef70321d: Loading layer [==================================================>] 163 MB/163 MB
517f170bbf7f: Loading layer [==================================================>] 20.98 MB/20.98 MB
665edb80fc91: Loading layer [==================================================>] 337.4 kB/337.4 kB
Loaded image: sandbox:latest 3. Verify the image was successfully imported:
$ docker images You should see something similar to this:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
sandbox latest fc813bdc4bdd 3 days ago 14.57 GB 4. Start the container: The first time you start the container, you need to create it via the run command. The run command both creates and starts the container.
$ docker run -v hadoop:/hadoop --name sandbox --hostname "sandbox.hortonworks.com" --privileged -d \
-p 6080:6080 \
-p 9090:9090 \
-p 9000:9000 \
-p 8000:8000 \
-p 8020:8020 \
-p 42111:42111 \
-p 10500:10500 \
-p 16030:16030 \
-p 8042:8042 \
-p 8040:8040 \
-p 2100:2100 \
-p 4200:4200 \
-p 4040:4040 \
-p 8050:8050 \
-p 9996:9996 \
-p 9995:9995 \
-p 8080:8080 \
-p 8088:8088 \
-p 8886:8886 \
-p 8889:8889 \
-p 8443:8443 \
-p 8744:8744 \
-p 8888:8888 \
-p 8188:8188 \
-p 8983:8983 \
-p 1000:1000 \
-p 1100:1100 \
-p 11000:11000 \
-p 10001:10001 \
-p 15000:15000 \
-p 10000:10000 \
-p 8993:8993 \
-p 1988:1988 \
-p 5007:5007 \
-p 50070:50070 \
-p 19888:19888 \
-p 16010:16010 \
-p 50111:50111 \
-p 50075:50075 \
-p 50095:50095 \
-p 18080:18080 \
-p 60000:60000 \
-p 8090:8090 \
-p 8091:8091 \
-p 8005:8005 \
-p 8086:8086 \
-p 8082:8082 \
-p 60080:60080 \
-p 8765:8765 \
-p 5011:5011 \
-p 6001:6001 \
-p 6003:6003 \
-p 6008:6008 \
-p 1220:1220 \
-p 21000:21000 \
-p 6188:6188 \
-p 61888:61888 \
-p 2181:2181 \
-p 2222:22 \
sandbox /usr/sbin/sshd -D Note: Mounting local drives to the sandbox If you would like to mount local drives on the host to your sandbox, you need to add another -v option to the command above. I typically recommend creating working directories for each of your docker containers, such as /Users/<username>/Development/sandbox or /Users/<username>/Development/hdp25-demo-sandbox. In doing this, you can copy the docker run command above into a script called create_container.sh and you simply change the --name option to be unique and correspond to the directory the script is in. Lets look at an example. In this scenario I'm going to create a directory called /Users/<username>/Development/hdp25-demo-sandbox where I will create my create_container.sh script. Inside of that script I will have as the first line: $ docker run -v `pwd`:`pwd` -v hadoop:/hadoop --name hdp25-demo-sandbox --hostname "sandbox.hortonworks.com" --privileged -d \ Once the container is running you will notice the container has /Users/<username>/Development/hdp25-demo-sandbox as a mount. This is similar in nature/concept to the /vagrant mount when using Vagrant. This allows you to easily share data between the container and your host without having to copy the data around. Once the container is created and running, Docker will display a CONTAINER ID for the container. You should see something similar to this:
fe57fe79f795905daa50191f92ad1f589c91043a30f7153899213a0cadaa5631 For all future container starts, you only need to run the docker start command:
$ docker start sandbox Notice that sandbox is the name of the container in the run command used above. If you name the container the same name as the container project directory, like hdp25-demo-sandbox above, it will make it easier to remember what the container name is. However, you can always create a start_container.sh script that includes the above start command. Similarly you can create a stop_container.sh script that stops the container. 5. Verify the container is running:
$ docker ps You should see something similar to this:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
85d7ec7201d8 sandbox "/usr/sbin/sshd -D" 31 seconds ago Up 27 seconds 0.0.0.0:1000->1000/tcp, 0.0.0.0:1100->1100/tcp, 0.0.0.0:1220->1220/tcp, 0.0.0.0:1988->1988/tcp, 0.0.0.0:2100->2100/tcp, 0.0.0.0:4040->4040/tcp, 0.0.0.0:4200->4200/tcp, 0.0.0.0:5007->5007/tcp, 0.0.0.0:5011->5011/tcp, 0.0.0.0:6001->6001/tcp, 0.0.0.0:6003->6003/tcp, 0.0.0.0:6008->6008/tcp, 0.0.0.0:6080->6080/tcp, 0.0.0.0:6188->6188/tcp, 0.0.0.0:8000->8000/tcp, 0.0.0.0:8005->8005/tcp, 0.0.0.0:8020->8020/tcp, 0.0.0.0:8040->8040/tcp, 0.0.0.0:8042->8042/tcp, 0.0.0.0:8050->8050/tcp, 0.0.0.0:8080->8080/tcp, 0.0.0.0:8082->8082/tcp, 0.0.0.0:8086->8086/tcp, 0.0.0.0:8088->8088/tcp, 0.0.0.0:8090-8091->8090-8091/tcp, 0.0.0.0:8188->8188/tcp, 0.0.0.0:8443->8443/tcp, 0.0.0.0:8744->8744/tcp, 0.0.0.0:8765->8765/tcp, 0.0.0.0:8886->8886/tcp, 0.0.0.0:8888-8889->8888-8889/tcp, 0.0.0.0:8983->8983/tcp, 0.0.0.0:8993->8993/tcp, 0.0.0.0:9000->9000/tcp, 0.0.0.0:9090->9090/tcp, 0.0.0.0:9995-9996->9995-9996/tcp, 0.0.0.0:10000-10001->10000-10001/tcp, 0.0.0.0:10500->10500/tcp, 0.0.0.0:11000->11000/tcp, 0.0.0.0:15000->15000/tcp, 0.0.0.0:16010->16010/tcp, 0.0.0.0:16030->16030/tcp, 0.0.0.0:18080->18080/tcp, 0.0.0.0:19888->19888/tcp, 0.0.0.0:21000->21000/tcp, 0.0.0.0:42111->42111/tcp, 0.0.0.0:50070->50070/tcp, 0.0.0.0:50075->50075/tcp, 0.0.0.0:50095->50095/tcp, 0.0.0.0:50111->50111/tcp, 0.0.0.0:60000->60000/tcp, 0.0.0.0:60080->60080/tcp, 0.0.0.0:61888->61888/tcp, 0.0.0.0:2222->22/tcp sandbox Notice the CONTAINER ID is the shortened version of the ID displayed when you ran the run command. 6. To stop the container Once the container is running, you stop it using the following command:
$ docker stop sandbox 7. To connect to the container You connect to the container via ssh using the following command:
$ ssh -p 2222 root@localhost The first time you log into the container, you will be prompted to change the root password. The root password for the container is hadoop . 8. Start sandbox services The Ambari and HDP services do not start automatically when you start the Docker container. You need to start the processes with a script.
$ ssh -p 2222 root@localhost
$ /etc/init.d/startup_script start You should see something similar to this (you can ignore any warnings):
# /etc/init.d/startup_script start
Starting tutorials... [ Ok ]
Starting startup_script...
Starting HDP ...
Starting mysql [ OK ]
Starting Flume [ OK ]
Starting Postgre SQL [ OK ]
Starting Ranger-admin [ OK ]
Starting name node [ OK ]
Starting Ranger-usersync [ OK ]
Starting data node [ OK ]
Starting Zookeeper nodes [ OK ]
Starting Oozie [ OK ]
Starting Ambari server [ OK ]
Starting NFS portmap [ OK ]
Starting Hdfs nfs [ OK ]
Starting Hive server [ OK ]
Starting Hiveserver2 [ OK ]
Starting Ambari agent [ OK ]
Starting Node manager [ OK ]
Starting Yarn history server [ OK ]
Starting Resource manager [ OK ]
Starting Webhcat server [ OK ]
Starting Spark [ OK ]
Starting Mapred history server [ OK ]
Starting Zeppelin [ OK ]
Safe mode is OFF
Starting sandbox...
./startup_script: line 97: /proc/sys/kernel/hung_task_timeout_secs: No such file or directory
Starting shellinaboxd: [ OK ] 9. You can now connect to your HDP instance via a web browser at http://localhost:8888
... View more
10-04-2016
07:07 PM
7 Kudos
Objective
This tutorial is intended to walk you through the process of troubleshooting why events are missing from the Metron Kibana dashboard when using the quick-dev-platform. Because of the constrained resources and added complexity of the virtualized environment, sometimes the vagrant instance doesn't come up in a healthy state.
Prerequisites
You should have already cloned the Metron git repo. Metron
You should have already deployed the Metron quick-dev-platform. quick-dev-platform
The expectation is you have already started Metron once and shut it down via a vagrant halt . Now you are trying to start Metron again via vagrant up , but things don't seem to be working properly. These troubleshooting steps are also be helpful during the initial start of the VM via the run.sh script.
Scope
Theses troubleshooting steps were performed in the following environment:
Mac OS X 10.11.6
Vagrant 1.8.6
VirtualBox 5.1.6
Python 2.7.12 (Anaconda distribution)
Steps
Start Metron quick-dev-platform
You should be in the /incubator-metron/metron-deployment/vagrant/quick-dev-platform directory.
$ cd <basedir>/incubator-metron/metron-deployment/vagrant/quick-dev-platform
Now you can run vagrant up to start the Metron virtual machine. You should see something similar to this:
$ vagrant up
Bringing machine 'node1' up with 'virtualbox' provider...
==> node1: Checking if box 'metron/hdp-base' is up to date...
==> node1: Clearing any previously set forwarded ports...
==> node1: Clearing any previously set network interfaces...
==> node1: Preparing network interfaces based on configuration...
node1: Adapter 1: nat
node1: Adapter 2: hostonly
==> node1: Forwarding ports...
node1: 22 (guest) => 2222 (host) (adapter 1)
==> node1: Running 'pre-boot' VM customizations...
==> node1: Booting VM...
==> node1: Waiting for machine to boot. This may take a few minutes...
node1: SSH address: 127.0.0.1:2222
node1: SSH username: vagrant
node1: SSH auth method: private key
node1: Warning: Remote connection disconnect. Retrying...
==> node1: Machine booted and ready!
[node1] GuestAdditions 5.1.6 running --- OK.
==> node1: Checking for guest additions in VM...
==> node1: Setting hostname...
==> node1: Configuring and enabling network interfaces...
==> node1: Mounting shared folders...
node1: /vagrant => /Volumes/Samsung/Development/incubator-metron/metron-deployment/vagrant/quick-dev-platform
==> node1: Updating /etc/hosts file on active guest machines...
==> node1: Updating /etc/hosts file on host machine (password may be required)...
==> node1: Machine already provisioned. Run `vagrant provision` or use the `--provision`
==> node1: flag to force provisioning. Provisioners marked to run always will still run.
Connect to Vagrant virtual machine
Now that the virtual machine is running, we can connect to it with vagrant ssh .
You should see something similar to this:
$ vagrant ssh
Last login: Tue Oct 4 14:54:04 2016 from 10.0.2.2
[vagrant@node1 ~]$
Check Ambari
The Ambari UI should be available at http://node1:8080 . Ambari should present you with a login screen.
NOTE: The vagrant hostmanager plugin should automatically update the /etc/hosts file on your local machine to add a node1 entry
You should see something similar to this:
You should be able to log into the interface using admin as the username and password. You should see something similar to this:
You should notice that all of the services are red. They do not auto start. Click the Actions button and then the Start All link. You should see something similar to this:
You should see a confirmation dialog similar to this:
Click the green confirm button. You should see an operations dialog showing you the status of the start all process. You should see something similar to this:
Click the green OK button and wait for the processes to finish starting. Once the startup process is complete, you should see something similar to this:
Check Kibana dashboard
Now that the cluster is running, we want to check our Kibana dashboard to see if there are any events. The Kibana dashboard should be available at http://node1:5000 .
The dashboard should present you with both bro and snort events. It may take a few minutes before events show up. If there are no events, then you should see something similar to this:
If you see only one type of events, such as snort , then you should see something like this:
Seeing either versions of these dashboards after 5-10 minutes indicates there is a problem with the data flow.
Check monit dashboard
The first thing you can do is check the monit dashboard. It should be available at 'http://node1:2812'. You should be presented with a login dialog. The username is admin and the password is monit . You should see something similar to this:
If you are unable to access the monit dashboard ui, try using the command-line. The monit command requires sudo.
You should see something similar to this:
$ sudo monit status
The Monit daemon 5.14 uptime: 25m
Process 'snort'
status Not monitored
monitoring status Not monitored
data collected Tue, 04 Oct 2016 15:22:12
Process 'snort-logs'
status Not monitored
monitoring status Not monitored
data collected Tue, 04 Oct 2016 15:22:12
Process 'pcap-service'
status Running
monitoring status Monitored
pid 3974
parent pid 1
uid 0
effective uid 0
gid 0
uptime 25m
children 0
memory 34.3 MB
memory total 34.3 MB
memory percent 0.4%
memory percent total 0.4%
cpu percent 0.0%
cpu percent total 0.0%
data collected Tue, 04 Oct 2016 15:47:25
Process 'pcap-replay'
status Running
monitoring status Monitored
pid 4024
parent pid 1
uid 0
effective uid 0
gid 0
uptime 25m
children 0
memory 856.0 kB
memory total 856.0 kB
memory percent 0.0%
memory percent total 0.0%
cpu percent 14.5%
cpu percent total 14.5%
data collected Tue, 04 Oct 2016 15:47:25
Program 'pcap-parser'
status Not monitored
monitoring status Not monitored
data collected Tue, 04 Oct 2016 15:22:12
Program 'yaf-parser'
status Not monitored
monitoring status Not monitored
data collected Tue, 04 Oct 2016 15:22:12
Program 'bro-parser'
status Status ok
monitoring status Monitored
last started Tue, 04 Oct 2016 15:47:25
last exit value 0
data collected Tue, 04 Oct 2016 15:47:25
Program 'snort-parser'
status Status ok
monitoring status Monitored
last started Tue, 04 Oct 2016 15:47:25
last exit value 0
data collected Tue, 04 Oct 2016 15:47:25
Process 'mysql'
status Not monitored
monitoring status Not monitored
data collected Tue, 04 Oct 2016 15:22:12
Process 'kibana'
status Running
monitoring status Monitored
pid 4052
parent pid 1
uid 496
effective uid 496
gid 0
uptime 25m
children 0
memory 99.0 MB
memory total 99.0 MB
memory percent 1.2%
memory percent total 1.2%
cpu percent 0.0%
cpu percent total 0.0%
data collected Tue, 04 Oct 2016 15:47:25
Program 'indexing'
status Status ok
monitoring status Monitored
last started Tue, 04 Oct 2016 15:47:25
last exit value 0
data collected Tue, 04 Oct 2016 15:47:25
Program 'enrichment'
status Status ok
monitoring status Monitored
last started Tue, 04 Oct 2016 15:47:25
last exit value 0
data collected Tue, 04 Oct 2016 15:47:25
Process 'elasticsearch'
status Running
monitoring status Monitored
pid 4180
parent pid 1
uid 497
effective uid 497
gid 491
uptime 25m
children 0
memory 210.7 MB
memory total 210.7 MB
memory percent 2.6%
memory percent total 2.6%
cpu percent 0.1%
cpu percent total 0.1%
data collected Tue, 04 Oct 2016 15:47:25
Process 'bro'
status Not monitored
monitoring status Not monitored
data collected Tue, 04 Oct 2016 15:22:12
System 'node1'
status Not monitored
monitoring status Not monitored
data collected Tue, 04 Oct 2016 15:22:12
If you see something like this:
$ sudo monit status
Cannot create socket to [node1]:2812 -- Connection refused
Then check to make sure that monit is running:
$ sudo service monit status
monit (pid 3981) is running...
If the service is running, but you can't access the web user interface or the command line, then you should restart the service.
$ sudo service monit restart
Shutting down monit: [ OK ]
Starting monit: [ OK ]
Now verify you can access the monit web ui. If you see any of the items under Process listed as Not Monitored , then we should start those processes.
Note: The vagrant image is using vagrant hostmanager to automatically update the the /etc/hosts file on both the host and the guest. If monit still does not restart properly, check the /etc/hosts file on node1. If you see:
127.0.0.1 node1 node1
on the first line of your /etc/hosts file, comment out or delete that line.
In my example, I can see that snort , snort-logs , bro processes are not running. I'm going to start them by clicking on the name of the process which will bring up a new view similar to this:
First click the Enable Monitoring button. The status should change to Not monitored - monitor pending . Click the Home link in the upper left of the monit dashboard. This takes you back to the main monit dashboard. You should see something similar to this:
Once the process has initialized, you should see something similar to this:
Repeat this process for the snort-logs and bro processes. Once that is complete, the monit dashboard should look similar to this:
Recheck the Kibana dashboard
Now recheck the Kibana dashboard to see if there are new events. You should see something similar to this:
If you do not see this, then we need to continue troubleshooting.
Check Kafka topics
First we need to make sure that all of our Kafka topics are up. You can verify by doing the following:
$ cd /usr/hdp/current/kafka-broker
$ ./bin/kafka-topics.sh --list --zookeeper localhost:2181
You should see something similar to this:
$ ./bin/kafka-topics.sh --list --zookeeper localhost:2181
bro
enrichments
indexing
indexing_error
parser_error
parser_invalid
pcap
snort
yaf
We have verified the topics exist. So let's see if any data is coming through the topics. First we'll check the bro topic.
$ ./bin/kafka-console-consumer.sh --topic bro --zookeeper localhost:2181
You should see something similar to this:
$ ./bin/kafka-console-consumer.sh --topic bro --zookeeper localhost:2181
{metadata.broker.list=node1:6667, request.timeout.ms=30000, client.id=console-consumer-21240, security.protocol=PLAINTEXT}
{"dns": {"ts":1475597283.245551,"uid":"CffpQZ36hz1gmVzj0a","id.orig_h":"192.168.66.1","id.orig_p":5353,"id.resp_h":"224.#.#.#","id.resp_p":5353,"proto":"udp","trans_id":0,"query":"hp envy 7640 series [c62abe]._uscan._tcp.local","qclass":32769,"qclass_name":"qclass-32769","qtype":33,"qtype_name":"SRV","AA":false,"TC":false,"RD":false,"RA":false,"Z":0,"rejected":false}}
It may take a few seconds before a message is shown. If you are seeing messages, then the bro topic is working ok. Press ctrl-c to exit.
Now let's check the snort topic.
$ ./bin/kafka-console-consumer.sh --topic snort --zookeeper localhost:2181
You should see something similar to this:
$ ./bin/kafka-console-consumer.sh --topic snort --zookeeper localhost:2181
{metadata.broker.list=node1:6667, request.timeout.ms=30000, client.id=console-consumer-73857, security.protocol=PLAINTEXT}
10/04-16:09:18.045368 ,1,999158,0,"'snort test alert'",TCP,192.168.138.158,49206,95.163.121.204,80,00:00:00:00:00:00,00:00:00:00:00:00,0x3C,***A****,0xA80DAF97,0xB93A1E6C,,0xFAF0,128,0,2556,40,40960,,,,
10/04-16:09:18.114314 ,1,999158,0,"'snort test alert'",TCP,95.#.#.#,80,192.168.138.158,49205,00:00:00:00:00:00,00:00:00:00:00:00,0x221,***AP***,0x628EE92,0xCA8D8698,,0xFAF0,128,0,2031,531,19464,,,,
10/04-16:09:18.185913 ,1,999158,0,"'snort test alert'",TCP,95.#.#.#,80,192.168.138.158,49210,00:00:00:00:00:00,00:00:00:00:00:00,0x21E,***AP***,0x9B7A5871,0x63626DD7,,0xFAF0,128,0,2032,528,16392,,,,
10/04-16:09:18.216988 ,1,999158,0,"'snort test alert'",TCP,192.168.138.158,49205,95.#.#.#,80,00:00:00:00:00:00,00:00:00:00:00:00,0x3C,***A****,0xCA8D8698,0x628F07D,,0xFAF0,128,0,2557,40,40960,,,,
10/04-16:09:18.292182 ,1,999158,0,"'snort test alert'",TCP,192.168.138.158,49210,95.#.#.#,80,00:00:00:00:00:00,00:00:00:00:00:00,0x3C,***A****,0x63626DD7,0x9B7A5A59,,0xF71F,128,0,2558,40,40960,,,,
10/04-16:09:18.310822 ,1,999158,0,"'snort test alert'",TCP,95.#.#.#,80.#.#.#.158,49208,00:00:00:00:00:00,00:00:00:00:00:00,0x21C,***AP***,0x8EF414C5,0xBE149917,,0xFAF0,128,0,2035,526,14344,,,,
It may take a few seconds before a message is shown. If you are seeing messages, then the snort topic is working ok. Press ctrl-c to exit.
Check MySQL
If MySQL is not running, then you will have issues seeing events in your Kibana dashboard. The first thing to do is see if it's running:
$ sudo service mysqld status
You should see something similar ot this:
$ sudo service mysqld status
mysqld (pid 1916) is running...
If it is not running, then you need to start it with:
$ sudo service mysqld start
Even if it is running, you may want to restart it.
$ sudo service mysqld restart
Stopping mysqld: [ OK ]
Starting mysqld: [ OK ]
Check Storm topologies
If there were issues with MySQL or you restarted it, you will need to reset the enrichment Storm topology. You may also want to reset the bro and snort typologies. From the Ambari user interface, click on the Storm service. You should see something similar to this:
In the quick links section, click on the Storm UI link. You should see something similar to this:
Under the Topology Summary section, there should be at least 4 topology entries: bro , enrichment , indexing , and snort
Click on the bro link. You should see something similar to this:
You should see events in the Topology stats section under the Acked column. Under the Spouts and Bolts sections, you should see events under the Acked column for the kafkaSpout and parserBolt respectively.
Under Topology actions, click on the kill button. This will kill the topology. The monit process should automatically restart the topology.
You should see a dialog asking you for a delay value in seconds, something similar to this:
The delay defaults to 30 seconds. You can set it to a smaller number, like 5. Then click the ok button.
Go back to the main Storm UI page. You should see something similar to this:
You should notice the bro topology is now missing. Now take a look at the monit dashboard. You should see something similar to this:
You may notice several programs listed as Status:failed . After a few seconds, monit will restart the topologies. You should see something similar to this:
Now go back to the Storm UI. You should see something similar to this:
You should notice the bro topology is back and it has an uptime less than the other topologies.
You can repeat this process for each of the topologies.
Recheck Kibana dashboard
Now we can recheck our Kibana dashboard. You should see something similar to this:
Now you events running through the Metron stack.
Review
We walked through some common troubleshooting steps in Metron when replay events are not being displayed in the Kibana dashboard. More often than not, the MySQL server is the culprit because it doesn't always come up cleanly when you do a vagrant up . That will impact the Storm topologies, particularly the enrichment topology.
... View more
Labels: