Member since
02-09-2016
559
Posts
422
Kudos Received
98
Solutions
11-08-2016
10:48 AM
1 Kudo
Objective
Many people work exclusively from a laptop where storage space is typically limited to 500GB of space or less. Over time, you may find your available storage space has become a regular concern. It's not uncommon to use an external hard drive to augment available storage space.
The current version of Docker for Mac (1.12.x) does not provide a configuration setting which allows users to change the location where the Docker virtual machine image is located. This means the image, which can grow up to 64GB in size by default, is located on your laptop's primary hard drive.
With the HDP 2.5 version of the Hortonworks sandbox available as a native Docker image, you may find a desire to have more room available to Docker. This tutorial will guide you through the process of moving your Docker virtual machine image to a different location, an external drive in this case. This will free up to 64GB of space on your primary laptop hard drive and let you expand the size of the Docker image file later. This tutorial is the first in a two part series.
Prerequisites
You should have already completed the following tutorial Installing Docker Version of Sandbox on Mac
You should have an external or secondary hard drive available.
Scope
Mac OS X 10.11.6 (El Capitan)
Docker for Mac 1.12.1
HDP 2.5 Docker Sandbox
Steps
Stop Docker for Mac
Before we can make any changes to the Docker virtual machine image, we need to stop Docker for Mac. There should be a Docker for Mac icon in the menu bar. You should see something similar to this:
You can also check via the command line via the ps -ef | grep -i com.docker . You should see something similar to this:
ps -ef | grep -i com.docker
0 123 1 0 8:45AM ?? 0:00.01 /Library/PrivilegedHelperTools/com.docker.vmnetd
502 967 876 0 8:46AM ?? 0:00.08 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 969 967 0 8:46AM ?? 0:00.04 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 971 967 0 8:46AM ?? 0:07.96 com.docker.db --url fd:3 --git /Users/myoung/Library/Containers/com.docker.docker/Data/database
502 975 967 0 8:46AM ?? 0:03.40 com.docker.osx.hyperkit.linux
502 977 975 0 8:46AM ?? 0:00.03 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux
502 12807 967 0 9:17PM ?? 0:00.08 com.docker.osxfs --address fd:3 --connect /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --control fd:4 --volume-control fd:5 --database /Users/myoung/Library/Containers/com.docker.docker/Data/s40
502 12810 967 0 9:17PM ?? 0:00.12 com.docker.slirp --db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 --ethernet fd:3 --port fd:4 --vsock-path /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --max-connections 900
502 12811 967 0 9:17PM ?? 0:00.19 com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 12812 12811 0 9:17PM ?? 0:00.02 /Applications/Docker.app/Contents/MacOS/com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 12814 12811 0 9:17PM ?? 0:16.48 /Applications/Docker.app/Contents/MacOS/com.docker.hyperkit -A -m 12G -c 6 -u -s 0:0,hostbridge -s 31,lpc -s 2:0,virtio-vpnkit,uuid=1f629fed-1ef6-4f34-8fce-753347e3b941,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s50,macfile=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/mac.0 -s 3,virtio-blk,file:///Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2,format=qcow -s 4,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s40,tag=db -s 5,virtio-rnd -s 6,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s51,tag=port -s 7,virtio-sock,guest_cid=3,path=/Users/myoung/Library/Containers/com.docker.docker/Data,guest_forwards=2376;1525 -l com1,autopty=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty,log=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/console-ring -f kexec,/Applications/Docker.app/Contents/Resources/moby/vmlinuz64,/Applications/Docker.app/Contents/Resources/moby/initrd.img,earlyprintk=serial console=ttyS0 com.docker.driver="com.docker.driver.amd64-linux", com.docker.database="com.docker.driver.amd64-linux" ntp=gateway mobyplatform=mac -F /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/hypervisor.pid
502 13790 876 0 9:52PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 13791 13790 0 9:52PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 13793 13146 0 9:52PM ttys000 0:00.00 grep -i com.docker
Now we are going to stop Docker for Mac. Before shutting down Docker, make sure all of your containers have been stopped. Using the menu shown above, click on the Quit Docker menu option. This will stop Docker for Mac. You should notice the Docker for Mac icon is no longer visible.
Now let's confirm the Docker processes we saw before are no longer running:
ps -ef | grep -i com.docker
0 123 1 0 8:45AM ?? 0:00.01 /Library/PrivilegedHelperTools/com.docker.vmnetd
502 13815 13146 0 9:54PM ttys000 0:00.00 grep -i com.docker
NOTE: It may take a minute or two before Docker completely shuts down.
Backup Docker virtual machine image
Before we make any changes to the Docker virtual machine image, we should back it up. This will temporarily use more space on your laptop hard drive. Make sure you have enough room to hold two copies of the data. As mentioned before, the Docker image can be up to 64GB by default. Let's check the current size of our image using du -sh . The Docker image file is located at ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/ by default.
du -sh ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/
64G /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/
In my case, my image size is 64GB. You need to be sure you have room for 2 copies of the com.docker.driver.amd64-linux directory. Now we'll make a copy of our image:
cd ~/Library/Containers/com.docker.docker/Data/
cp -r com.docker.driver.amd64-linux com.docker.driver.amd64-linux.backup
This copy serves as our backup of the image.
Copy Docker virtual machine image to external drive
Now we can make a copy of our image on our external hard drive. I have a 1TB SSD mounted at /Volumes/Samsung . I am going to store my Docker virtual machine image in /Volumes/Samsung/Docker/image . You should store the image in a location that makes sense for you.
cp -r com.docker.driver.amd64-linux /Volumes/Samsung/Docker/image/
This process will take a few minutes. It will take longer if you are not using an SSD. Let's confirm the directory now exists on the external hard drive.
ls -la /Volumes/Samsung/Docker/image/
total 0
drwxr-xr-x 3 myoung staff 102 Nov 3 17:08 .
drwxr-xr-x 11 myoung staff 374 Nov 3 17:03 ..
drwxr-xr-x@ 11 myoung staff 374 Nov 7 21:53 com.docker.driver.amd64-linux
You can also check the size:
du -sh /Volumes/Samsung/Docker/image/
64G /Volumes/Samsung/Docker/image/
Create symbolic link for Docker virtual machine image
Now that we have a copy of the Docker image on the external hard drive, we will use a symbolic link from the image directory on the laptop hard drive to image directory on the external hard drive. Before creating the link, we need to remove the current image directory on our laptop hard drive
rm -rf ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux
Now let's create the symbolic link. We will use the ln -s command. The syntax for ln is ln -s <target> <source> . In this case, target is the location on the external drive and source is the location on the internal drive.
ln -s /Volumes/Samsung/Docker/image/com.docker.driver.amd64-linux ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux
We can confirm the link was created:
ls -la ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux
lrwxr-xr-x 1 myoung staff 59 Nov 3 17:05 /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux -> /Volumes/Samsung/Docker/image/com.docker.driver.amd64-linux
Restart Docker for Mac
Now we can restart Docker for Mac. This is done by running the application from the Applications folder in the Finder. You should see something similar to this:
Double-click on the Docker application to start it. You should notice the Docker for Mac icon is now back in the main menu bar. You can also check via ps -ef | grep -i com.docker . You should see something similar to this:
ps -ef | grep -i com.docker
0 123 1 0 8:45AM ?? 0:00.01 /Library/PrivilegedHelperTools/com.docker.vmnetd
502 14476 14465 0 10:42PM ?? 0:00.03 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 14479 14476 0 10:42PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 14480 14476 0 10:42PM ?? 0:00.29 com.docker.db --url fd:3 --git /Users/myoung/Library/Containers/com.docker.docker/Data/database
502 14481 14476 0 10:42PM ?? 0:00.08 com.docker.osxfs --address fd:3 --connect /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --control fd:4 --volume-control fd:5 --database /Users/myoung/Library/Containers/com.docker.docker/Data/s40
502 14482 14476 0 10:42PM ?? 0:00.04 com.docker.slirp --db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 --ethernet fd:3 --port fd:4 --vsock-path /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --max-connections 900
502 14483 14476 0 10:42PM ?? 0:00.05 com.docker.osx.hyperkit.linux
502 14484 14476 0 10:42PM ?? 0:00.08 com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 14485 14483 0 10:42PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux
502 14486 14484 0 10:42PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 14488 14484 0 10:42PM ?? 0:07.90 /Applications/Docker.app/Contents/MacOS/com.docker.hyperkit -A -m 12G -c 6 -u -s 0:0,hostbridge -s 31,lpc -s 2:0,virtio-vpnkit,uuid=1f629fed-1ef6-4f34-8fce-753347e3b941,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s50,macfile=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/mac.0 -s 3,virtio-blk,file:///Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2,format=qcow -s 4,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s40,tag=db -s 5,virtio-rnd -s 6,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s51,tag=port -s 7,virtio-sock,guest_cid=3,path=/Users/myoung/Library/Containers/com.docker.docker/Data,guest_forwards=2376;1525 -l com1,autopty=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty,log=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/console-ring -f kexec,/Applications/Docker.app/Contents/Resources/moby/vmlinuz64,/Applications/Docker.app/Contents/Resources/moby/initrd.img,earlyprintk=serial console=ttyS0 com.docker.driver="com.docker.driver.amd64-linux", com.docker.database="com.docker.driver.amd64-linux" ntp=gateway mobyplatform=mac -F /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/hypervisor.pid
502 14559 14465 0 10:46PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 14560 14559 0 10:46PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 14562 13146 0 10:46PM ttys000 0:00.00 grep -i com.docker
You should notice the Docker processes are running again. You can also check the timestamp of files in the Docker image directory on the external hard drive:
ls -la /Volumes/Samsung/Docker/image/com.docker.driver.amd64-linux
total 134133536
drwxr-xr-x@ 12 myoung staff 408 Nov 7 22:42 .
drwxr-xr-x 3 myoung staff 102 Nov 3 17:08 ..
-rw-r--r-- 1 myoung staff 68676222976 Nov 7 22:45 Docker.qcow2
-rw-r--r-- 1 myoung staff 65536 Nov 7 22:42 console-ring
-rw-r--r-- 1 myoung staff 5 Nov 7 22:42 hypervisor.pid
-rw-r--r-- 1 myoung staff 0 Aug 24 16:06 lock
drwxr-xr-x 67 myoung staff 2278 Nov 5 22:00 log
-rw-r--r-- 1 myoung staff 17 Nov 7 22:42 mac.0
-rw-r--r-- 1 myoung staff 36 Aug 24 16:06 nic1.uuid
-rw-r--r-- 1 myoung staff 5 Nov 7 22:42 pid
-rw-r--r-- 1 myoung staff 59619 Nov 7 22:42 syslog
lrwxr-xr-x 1 myoung staff 12 Nov 7 22:42 tty -> /dev/ttys001
You should notice the timestamp of the Docker.qcow2 file has been updated which means Docker is now using this location for its image file.
Start a Docker container
You should attempt to start a Docker container to make sure everything is working fine. You can start the HDP sandbox via docker start sandbox if you've already installed it as listed in the prerequisites. If everything is working fine, you can delete the backup.
Delete Docker backup image
Now that everything is working using the new location, we can remove our backup.
rm -rf ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux.backup
Review
If you successfully followed along with this tutorial, we were able to move our Docker for Mac virtual machine image to an external hard drive. This should free up to 64GB of space on your laptop hard drive. Look for part two in the series to learn how to increase the size of your Docker image.
... View more
Labels:
11-04-2016
07:24 PM
Good catch. Tutorial has been updated to provide more links.
... View more
11-01-2016
08:55 PM
6 Kudos
Objective Cross Data Center Replication, commonly abbreviated as CDCR, is a new feature found in SolrCloud 6.x. This feature enables Solr to replicate data from one source collection to one or more target collections distributed between data centers. The current version provides an active-passive disaster recovery solution for Solr. Data updates, which include adds, updates, and deletes, are copied from the source collection to the target collection. This means the target collection should not be sent data updates outside of the CDRC functionality. Prior to SolrCloud 6.x you had to manually design a strategy for replication across data centers. This tutorial will guide you through the process of enabling CDCR between two SolrCloud clusters, each with 1 server, in a Vagrant + VirtualBox environment. NOTE: Solr 6 is being deployed as a standalone application. HDP 2.5 provides support for Solr 5.5.2 via HDPSearch which does not include CDCR functionality. Prerequisites You should have already installed the following: VirtualBox 5.1.6 (VirtualBox) Vagrant 1.8.6 (Vagrant) Vagrant plugin vagrant-vbguest 0.13.x (vagrant-vbguest) Vagrant plugin vagrant-hostmanager 1.8.5 ( vagrant-hostmanager) You should have already downloaded the Apache Solr 6.2.1 release ( Apache Solr 6.2.1) Scope This tutorial was tested using the following environment and components: Mac OS X 10.11.6 (El Capitan) VirtualBox 5.1.6 (tutorial should work with any newer version) Vagrant 1.8.6 vagrant-vbguest plugin 0.13.0 vagrant-hostnamanger plugin 1.8.5 Apache Solr 6.2.1 Steps Create Vagrant project directory I like to create project directories. My Vagrant work goes under ~/Vagrant/<project> and my Docker work goes under ~/Docker/<project> . This allows me to clearly identify which technology is associated with the projects and allows me to use various helper scripts to automate processes, etc. So let's create project directory for this tutorial.
mkdir -p ~/Vagrant/solrcloud-cdcr-tutorial && cd ~/Vagrant/solrcloud-cdcr-tutorial
Create Vagrantfile The Vagrantfile tells Vagrant how to configure your virtual machines. You can copy/paste my Vagrantfile below or use the version in the attachments area of this tutorial. Here is the content from my file:
# -*- mode: ruby -*-
# vi: set ft=ruby :
# Using yaml to load external configuration files
require 'yaml'
Vagrant.configure(2) do |config|
# Using the hostmanager vagrant plugin to update the host files
config.hostmanager.enabled = true
config.hostmanager.manage_host = true
config.hostmanager.manage_guest = true
config.hostmanager.ignore_private_ip = false
# Loading in the list of commands that should be run when the VM is provisioned.
commands = YAML.load_file('commands.yaml')
commands.each do |command|
config.vm.provision :shell, inline: command
end
# Loading in the VM configuration information
servers = YAML.load_file('servers.yaml')
servers.each do |servers|
config.vm.define servers[name] do |srv|
srv.vm.box = servers[box] # Speciy the name of the Vagrant box file to use
srv.vm.hostname = servers[name] # Set the hostname of the VM
srv.vm.network private_network, ip: servers[ip], :adapater=>2 # Add a second adapater with a specified IP
srv.vm.network :forwarded_port, guest: 22, host: servers[port] # Add a port forwarding rule
srv.vm.provision :shell, inline: "sed -i'' '/^127.0.0.1\t#{srv.vm.hostname}\t#{srv.vm.hostname}$/d' /etc/hosts" # Remove the extraneous first entry in /etc/hosts
srv.vm.provider :virtualbox do |vb|
vb.name = servers[name] # Name of the VM in VirtualBox
vb.cpus = servers[cpus] # How many CPUs to allocate to the VM
vb.memory = servers[ram] # How much memory to allocate to the VM
vb.customize [modifyvm, :id, --cpuexecutioncap, 25] # Limit to VM to 25% of available CPU
end
end
end
end
Create a servers.yaml file The servers.yaml file contains the configuration information for our VMs. You can copy/paste my servers.yaml below or use the version in the attachments area of this tutorial. Here is the content from my file:
---
- name: solr-dc01
box: bento/centos-7.2
cpus: 2
ram: 2048
ip: 192.168.56.101
port: 10122
- name: solr-dc02
box: bento/centos-7.2
cpus: 2
ram: 2048
ip: 192.168.56.202
port: 20222
Create commands.yaml file The commands.yaml file contains the list of commands that should be run on each VM when they are first provisioned. This allows us to automate configuration tasks that would otherwise be tedious and/or repetitive. You can copy/paste my commands.yaml below or use the version in the attachments area of this tutorial. Here is the content from my file:
- sudo yum -y install net-tools ntp wget java-1.8.0-openjdk java-1.8.0-openjdk-devel lsof
- sudo systemctl enable ntpd && sudo systemctl start ntpd
- sudo systemctl disable firewalld && sudo systemctl stop firewalld
- sudo sed -i --follow-symlinks 's/^SELINUX=.*/SELINUX=disabled/g' /etc/sysconfig/selinux
Copy Solr release file to Vagrant our project directory Our project directory is accessible to each of our Vagrant VMs via the /vagrant mount point. This allows us to easily access files and data located in our project directory. Instead of using scp to copy the Apache Solr release file to each of the VMs and creating duplicate files, we'll use a single copy located in our project directory.
cp ~/Downloads/solr-6.2.1.tgz .
NOTE: This assumes you are on a Mac and your downloads are in the ~/Downloads directory. Start virtual machines Now we are ready to start our 2 virtual machines for the first time. Creating the VMs for the first time and starting them every time after that uses the same command:
vagrant up
Once the process is complete you should have 2 servers running. You can verify by looking at VirtualBox. Notice I have 2 VMs running called solr-dc01 and solr-dc02:
Connect to each virtual machine You are able to login to each of the VMs via ssh using the vagrant ssh command. You must specify the name of the VM you want to connect to. vagrant ssh solr-dc01
Using another terminal window, repeat this process for solr-dc02 . Extract Solr install scripts The Solr release archive file contains an installation script. This installation script will do the following by default: NOTE: This assumes that you downloaded Solr 6.2.1 Install Solr under /opt/solr-6.2.1 Create a symbolic link between /opt/solr and /opt/solr-6.2.1 Create a solr user. Live data such as indexes, logs, etc are stored in /var/solr. On solr-dc01 , run the following command: tar xvfz /vagrant/solr-6.2.1.tgz solr-6.2.1/bin/install_solr_service.sh --strip-components=2
Repeat this process for solr-dc02 This will create a file called install_solr_services.sh in your current directory, which should be the /home/vagrant . Install Apache Solr Now we can install Solr using the script defaults: sudo bash ./install_solr_service.sh /vagrant/solr-6.2.1.tgz
The command above is the same as if you had specified the default settings: sudo bash ./install_solr_service.sh /vagrant/solr-6.2.1.tgz -i /opt -d /var/solr -u solr -s solr -p 8983
After running the command, you should see something similar to this: id: solr: no such user
Creating new user: solr
Extracting /vagrant/solr-6.2.1.tgz to /opt
Installing symlink /opt/solr -> /opt/solr-6.2.1 ...
Installing /etc/init.d/solr script ...
Installing /etc/default/solr.in.sh ...
Waiting up to 30 seconds to see Solr running on port 8983 [/]
Started Solr server on port 8983 (pid=29168). Happy searching!
Found 1 Solr nodes:
Solr process 29168 running on port 8983
{
solr_home:/var/solr/data,
version:6.2.1 43ab70147eb494324a1410f7a9f16a896a59bc6f - shalin - 2016-09-15 05:20:53,
startTime:2016-10-31T19:46:27.997Z,
uptime:0 days, 0 hours, 0 minutes, 12 seconds,
memory:13.4 MB (%2.7) of 490.7 MB}
Service solr installed.
If you run the following command, you can see the Solr process is running: ps -ef | grep solr
solr 28980 1 0 19:49 ? 00:00:11 java -server -Xms512m -Xmx512m -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/solr/logs/solr_gc.log -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/opt/solr/server -Dsolr.solr.home=/var/solr/data -Dsolr.install.dir=/opt/solr -Dlog4j.configuration=file:/var/solr/log4j.properties -Xss256k -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /var/solr/logs -jar start.jar --module=http
Repeat this process for solr-dc02 Modify Solr service It's more convenient to use the OS services infrastructure to manage running Solr processes than manually using scripts. The installation process creates a service script that starts Solr in single instance mode. To take advantage of CDCR, you must use SolrCloud mode. We need to make some changes to the service script for this to work. We'll be using the embedded Zookeeper instance for our tutorial. To do this, we need a zookeeper configuration file in our /var/solr/data directory. We'll copy the default configuration file from /opt/solr/server/solr/zoo.cfg . sudo -u solr cp /opt/solr/server/solr/zoo.cfg /var/solr/data/zoo.cfg
Now we need the /etc/init.d/solr service script to run Solr in SolrCloud mode. This is done by adding the -c parameter to the start process. When no other parameters are specified, Solr will start an embedded Zookeeper instance on the Solr port + 1000. In our case, that should be 9983 because our default Solr port is 8983 . Because this file is owned by root, we'll need to use sudo. exit
sudo vi /etc/init.d/solr
Look near the end of the file for the line: ...
case $1 in
start|stop|restart|status)
SOLR_CMD=$1
...
This is the section that defines the Solr command. We want to change the SOLR_CMD=$1 line to look like this SOLR_CMD=$1 -c . This will tell Solr that it should start in cloud mode. NOTE: In production, you would not use the embedded Zookeeper. You would update the /etc/defaults/solr.in.sh to set the ZK_HOST variable to the production Zookeeper instances. When this variable is set, Solr will not start the embedded Zookeeper. So the section of your file should now look like this: ...
case $1 in
start|stop|restart|status)
SOLR_CMD=$1 -c
...
Now save the file: Press the `esc` KEY
!wq Let's stop Solr: sudo service solr stop
Now we can start Solr using the new script: sudo service solr start
Once the process is started, we can check the status: sudo service solr status
Found 1 Solr nodes:
Solr process 29426 running on port 8983
{
solr_home:/var/solr/data,
version:6.2.1 43ab70147eb494324a1410f7a9f16a896a59bc6f - shalin - 2016-09-15 05:20:53,
startTime:2016-10-31T22:16:22.116Z,
uptime:0 days, 0 hours, 0 minutes, 14 seconds,
memory:30.2 MB (%6.1) of 490.7 MB,
cloud:{
ZooKeeper:localhost:9983,
liveNodes:1,
collections:0}}
As you can see, the process started successfully and there is a single cloud node running using Zookeeper on port 9983 . Repeat this process for solr-dc02 . Create Solr dc01 configuration The solr-dc01 Solr instance will be our source collection for replication. To enable CDCR we need to make a few changes to the solrconfig.xml configuration file. We'll use the data_driven_schema_configs as a base for our configuration. We need to create two different configurations because the source collection has a slightly different configuration than the target collection. On the solr-dc01 VM, copy the data_driven_schema_configs directory to the vagrant home directory. If you are following along, you should still be the vagrant user. cd /home/vagrant
cp -r /opt/solr/server/solr/configsets/data_driven_schema_configs .
Edit the solrconfig.xml file: vi data_driven_schema_configs/conf/solrconifg.xml
The first thing we are going to do is update the updateHandler definition; there is only one in the file. Find the section in the configuration file that looks like this: <updateHandler class=solr.DirectUpdateHandler2>
We are going to change the updateLog portion of the configuration. Remember that we are using vi as the text editor, so edit using the appropriate vi commands. Change this: <updateLog>
<str name=dir>${solr.ulog.dir:}</str>
<int name=numVersionBuckets>${solr.ulog.numVersionBuckets:65536}</int>
</updateLog>
to this: <updateLog class=solr.CdcrUpdateLog>
<str name=dir>${solr.ulog.dir:}</str>
<int name=numVersionBuckets>${solr.ulog.numVersionBuckets:65536}</int>
</updateLog>
Now we need to create a new requestHandler definition. Find the section in the configuration file that looks like this: <!-- A request handler that returns indented JSON by default -->
<requestHandler name=/query class=solr.SearchHandler>
<lst name=defaults>
<str name=echoParams>explicit</str>
<str name=wt>json</str>
<str name=indent>true</str>
</lst>
</requestHandler>
We are going to add our new definition just after the closing requestHandler . Add the following new definition: <!-- A request handler for cross data center replication -->
<requestHandler name=/cdcr class=solr.CdcrRequestHandler>
<lst name=replica>
<str name=zkHost>192.168.56.202:9983</str>
<str name=source>collection1</str>
<str name=target>collection1</str>
</lst>
<lst name=replicator>
<str name=threadPoolSize>8</str>
<str name=schedule>1000</str>
<str name=batchSize>128</str>
</lst>
<lst name=updateLogSynchronizer>
<str name=schedule>1000</str>
</lst>
</requestHandler>
Your updated file should now look like this: ...
<!-- A request handler that returns indented JSON by default -->
<requestHandler name=/query class=solr.SearchHandler>
<lst name=defaults>
<str name=echoParams>explicit</str>
<str name=wt>json</str>
<str name=indent>true</str>
</lst>
</requestHandler>
<!-- A request handler for cross data center replication -->
<requestHandler name=/cdcr class=solr.CdcrRequestHandler>
<lst name=replica>
<str name=zkHost>192.168.56.202:9983</str>
<str name=source>collection1</str>
<str name=target>collection1</str>
</lst>
<lst name=replicator>
<str name=threadPoolSize>8</str>
<str name=schedule>1000</str>
<str name=batchSize>128</str>
</lst>
<lst name=updateLogSynchronizer>
<str name=schedule>1000</str>
</lst>
</requestHandler>
...
NOTE: The zkHost line should have the ip address and port of the Zookeeper instance of the target collection. Our target collection is on solr-dc02 , so this ip and port are pointing to solr-dc02. When we create our collections in Solr, we'll use the name collection1 . Now save the file: Press the `esc` KEY
!wq
Create Solr dc02 configuration The solr-dc02 Solr instance will be our target collection for replication. To enable CDCR we need to make a few changes to the solrconfig.xml configuration file. As above, we'll use the data_driven_schema_configs as a base for our configuration. On solr-dc02 , copy the data_driven_schema_configs directory to the vagrant home directory. If you are following along, you should still be the vagrant user. cd /home/vagrant
cp -r /opt/solr/server/solr/configsets/data_driven_schema_configs .
Edit the solrconfig.xml file: vi data_driven_schema_configs/conf/solrconifg.xml
The first thing we are going to do is update the updateHandler definition; there is only one in the file. Find the section in the configuration file that looks like this: <updateHandler class=solr.DirectUpdateHandler2>
We are going to change the updateLog portion of the configuration. Remember that we are using vi as the text editor. Change this: <updateLog>
<str name=dir>${solr.ulog.dir:}</str>
<int name=numVersionBuckets>${solr.ulog.numVersionBuckets:65536}</int>
</updateLog>
to this: <updateLog class=solr.CdcrUpdateLog>
<str name=dir>${solr.ulog.dir:}</str>
<int name=numVersionBuckets>${solr.ulog.numVersionBuckets:65536}</int>
</updateLog>
Now we need to create a new requestHandler definition. Find the section in the configuration file that looks like this: <!-- A request handler that returns indented JSON by default -->
<requestHandler name=/query class=solr.SearchHandler>
<lst name=defaults>
<str name=echoParams>explicit</str>
<str name=wt>json</str>
<str name=indent>true</str>
</lst>
</requestHandler>
We are going to add our new definition just after the closing requestHandler . Add the following new definition: <!-- A request handler for cross data center replication -->
<requestHandler name=/cdcr class=solr.CdcrRequestHandler>
<lst name=buffer>
<str name=defaultState>disabled</str>
</lst>
</requestHandler>
<!-- A request handler for cross data center replication -->
<requestHandler name=/update class=solr.UpdateRequestHandler>
<lst name=defaults>
<str name=update.chain>cdcr-processor-chain</str>
</lst>
</requestHandler>
<updateRequestProcessorChain name=cdcr-processor-chain>
<processor class=solr.CdcrUpdateProcessorFactory/>
<processor class=solr.RunUpdateProcessorFactory/>
</updateRequestProcessorChain>
Your updated file should now look like this: ...
<!-- A request handler that returns indented JSON by default -->
<requestHandler name=/query class=solr.SearchHandler>
<lst name=defaults>
<str name=echoParams>explicit</str>
<str name=wt>json</str>
<str name=indent>true</str>
</lst>
</requestHandler>
<!-- A request handler for cross data center replication -->
<requestHandler name=/cdcr class=solr.CdcrRequestHandler>
<lst name=buffer>
<str name=defaultState>disabled</str>
</lst>
</requestHandler>
<!-- A request handler for cross data center replication -->
<requestHandler name=/update class=solr.UpdateRequestHandler>
<lst name=defaults>
<str name=update.chain>cdcr-processor-chain</str>
</lst>
</requestHandler>
<updateRequestProcessorChain name=cdcr-processor-chain>
<processor class=solr.CdcrUpdateProcessorFactory/>
<processor class=solr.RunUpdateProcessorFactory/>
</updateRequestProcessorChain>
...
Now save the file: Press the `esc` KEY
!wq
You should see how the two configurations are different between the source and target collections. Create Solr collection on solr-dc01 and solr-dc02 Now we should be able to create a collection using our update configuration. Because the two configurations are different, make sure you run this command on both the solr-dc01 and solr-dc02 VMs. This is creating the collections in our respective data centers. /opt/solr/bin/solr create -c collection1 -d ./data_driven_schema_configs
NOTE: We are using the same collection name that has CDCR enabled in the configuration. You should see something similar to this: /opt/solr/bin/solr create -c collection1 -d ./data_driven_schema_configs
Connecting to ZooKeeper at localhost:9983 ...
Uploading /home/vagrant/data_driven_schema_configs/conf for config collection1 to ZooKeeper at localhost:9983
Creating new collection 'collection1' using command:
http://localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=collection1
{
responseHeader:{
status:0,
QTime:3684},
success:{192.168.56.101:8983_solr:{
responseHeader:{
status:0,
QTime:2546},
core:collection1_shard1_replica1}}}
Now we can verify the collection exists in the Solr admin ui via: http://192.168.56.101:8983/solr/#/~cloud You should see something similar to this: As you can see, there is a single collection named collection1 which has 1 shard. You can repeat this process on solr-dc02 and see something similar. NOTE: Remember that solr-dc01 is 192.168.56.101 and solr-dc02 is 192.168.56.202. Turn on replication Let's first check the status of replication. Each of these curl commands is interacting with the collection api. You can check the status of replication using the following command: curl -XPOST 'http://192.168.56.101:8983/solr/collection1/cdcr?action=STATUS'
You should see something similar to this: curl -XPOST 'http://192.168.56.101:8983/solr/collection1/cdcr?action=STATUS'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader><int name=status>0</int><int name=QTime>5</int></lst><lst name=status><str name=process>stopped</str><str name=buffer>enabled</str></lst>
</response>
You should notice the process is displayed as stopped . We want to start the replication process. curl -XPOST 'http://192.168.56.101:8983/solr/collection1/cdcr?action=START'
You should see something similar to this: curl -XPOST 'http://192.168.56.101:8983/solr/collection1/cdcr?action=START'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader><int name=status>0</int><int name=QTime>41</int></lst><lst name=status><str name=process>started</str><str name=buffer>enabled</str></lst>
</response>
You should notice the process is now started . Now we need to disable the buffer on the target colleciton which will buffer the updates by default. curl -XPOST 'http://192.168.56.202:8983/solr/collection1/cdcr?action=DISABLEBUFFER'
You should see something similar to this: curl -XPOST 'http://192.168.56.202:8983/solr/collection1/cdcr?action=DISABLEBUFFER'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader><int name=status>0</int><int name=QTime>7</int></lst><lst name=status><str name=process>started</str><str name=buffer>disabled</str></lst>
</response>
You should notice the buffer is now disabled . Add documents to source Solr collection in solr-dc01 Now we will add a couple of sample documents to collection1 in solr-dc01. Run the following command to add 2 sample documents: curl -XPOST -H 'Content-Type: application/json' 'http://192.168.56.101:8983/solr/collection1/update' --data-binary '{
add : {
doc : {
id : 1,
text_ws : This is document number one.
}
},
add : {
doc : {
id : 2,
text_ws : This is document number two.
}
},
commit : {}
}'
You should notice the commit command in the JSON above. That is because the default solrconfig.xml does not have automatic commits enabled. You should get a response back similar to this: {responseHeader:{status:0,QTime:362}}
Query solr-dc01 collection Let's query collection1 on solr-dc01 to ensure the documents are present. Run the following command: curl -XGET 'http://192.168.56.101:8983/solr/collection1/select?q=*:*&indent=true'
You should see something similar to this: curl -XGET 'http://192.168.56.101:8983/solr/collection1/select?q=*:*&indent=true'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader>
<bool name=zkConnected>true</bool>
<int name=status>0</int>
<int name=QTime>17</int>
<lst name=params>
<str name=q>*:*</str>
<str name=indent>true</str>
</lst>
</lst>
<result name=response numFound=2 start=0>
<doc>
<str name=id>1</str>
<str name=text_ws>This is document number one.</str>
<long name=_version_>1549823582071160832</long></doc>
<doc>
<str name=id>2</str>
<str name=text_ws>This is document number two.</str>
<long name=_version_>1549823582135123968</long></doc>
</result>
</response>
Query solr-dc02 collection Before executing the query on solr-dc02 , we need to commit the changes. As mentioned above, automatic commits are not enabled in the default solrconfig.xml . Run the following command; curl -XPOST -H 'Content-Type: application/json' 'http://192.168.56.202:8983/solr/collection1/update' --data-binary '{
commit : {}
}'
You should see a response similar to this: {responseHeader:{status:0,QTime:5}}
Now we can run our query: curl -XGET 'http://192.168.56.202:8983/solr/collection1/select?q=*:*&indent=true'
You should see something similar to this: curl -XGET 'http://192.168.56.202:8983/solr/collection1/select?q=*:*&indent=true'
<?xml version=1.0 encoding=UTF-8?>
<response>
<lst name=responseHeader>
<bool name=zkConnected>true</bool>
<int name=status>0</int>
<int name=QTime>17</int>
<lst name=params>
<str name=q>*:*</str>
<str name=indent>true</str>
</lst>
</lst>
<result name=response numFound=2 start=0>
<doc>
<str name=id>1</str>
<str name=text_ws>This is document number one.</str>
<long name=_version_>1549823582071160832</long></doc>
<doc>
<str name=id>2</str>
<str name=text_ws>This is document number two.</str>
<long name=_version_>1549823582135123968</long></doc>
</result>
</response>
You should notice that you have 2 documents, which have the same id and text_ws content as you pushed to solr-dc01. Review If you followed along with this tutorial, you have successfully set up cross data center replication between two SolrCloud configurations. Some important points to keep in mind: Because this is an active-passive approach, there is only a single source system. If the source system goes down, your ingest will stop as the other data center is read-only and should not have updates pushed outside of the replication process. Work is being done to make Solr CDCR active-active. Cross data center communications can be a potential bottleneck. If the cross data center connection can not sustain sufficient throughput, the target data center(s) can fall behind in replication. CDCR is not intended nor optimized for bulk inserts. If you have a need to do bulk inserts, first synchronize the indexes between the data centers outside of the replication process. Then enable replication for incremental updates. For more information, read about Cross Data Center Replication https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62687462
... View more
Labels:
01-18-2018
09:33 PM
with HDP-2.6, I'm facing an issue with the zookeeper-server and client install with the above config. I tried removing and re-installing but that didn't work either. mkdir: cannot create directory `/usr/hdp/current/zookeeper-client': File exists
... View more
10-08-2016
04:18 PM
4 Kudos
Objective
Given the limited resources available in a virtualized sandbox, you may choose to turn specific services on or off. You may choose to enable or disable security, such as Kerberos. Depending on your scenario, you may have a need to switch between these configurations frequently. For reproducible demos, you likely do not want to make these changes between one demo and the next. If you are like me, you may want to have different copies of HDP sandboxes to cover different demo scenarios.
With VirtualBox or VMWare sandboxes, you can easily import or clone a sandbox to have multiple, distinct copies. Each copy is unique with no sharing of configuration or data. However, this approach is not quite as intuitive when using the Docker sandbox. If you tried to create multiple containers on a Docker image thinking they would be separate copies, you likely have found they are not completely separate!
This tutorial will guide you through the process of using a single sandbox image, with multiple containers, without sharing the sandbox HDP configurations by mapping the container's /hadoop directory to distinct paths within the Docker VM.
This tutorial is a continuation of this one:
HCC Article Prerequisites
You should have already completed this tutorial: HCC Article Scope
This tutorial was tested using the following environment and components:
Mac OS X 10.11.6 HDP 2.5 on Hortonworks Sandbox (Docker Version) Docker for Mac 1.12.1 Steps Identify where container storage is located
The create container command
docker run , which was run in the previous tutorial, specifies a directory mount of -v hadoop:/hadoop . This tells Docker to create the container with a mount of /hadoop that points to the VM host location hadoop which is a relative path. We are trying to figure out where this is.
To see what storage mounts our Docker container has, we can use the
docker inspect command. If you followed my tutorial, we created the container and gave it the name sandbox .
$ docker inspect sandbox
In the output of this command you want to look for the
Mounts section. You should see something similar to this:
...
"Mounts": [
{
"Name": "hadoop",
"Source": "/var/lib/docker/volumes/hadoop/_data",
"Destination": "/hadoop",
"Driver": "local",
"Mode": "z",
"RW": true,
"Propagation": "rprivate"
}
],
...
From this output we can see that
/hadoop is pointing to /var/lib/docker/volumes/hadoop/_data . So let's see what's in that location.
$ ls /var/lib/docker/volumes/hadoop/_data
ls: /var/lib/docker/volumes/hadoop/_data: No such file or directory
The directory doesn't exist. Why is this? The latest version of Docker for Mac is uing the Hyperkit (
Hyperkit) as the virtualization layer. Previous versions used VirtualBox as the virtualization layer. Both versions use a common VM to run all of the containers. So the Source path is not on the Mac itself, rather it is on the VM.
So let's connect to the Docker VM to see if the directory exists there. The following command will start a temporary container based on an Alpine Linux image that mounts the Docker VMs root directory as
/vm-root and then does an ls -latr on it.
$ docker run --rm -it -v /:/vm-root alpine:edge ls -latr /vm-root/var/lib/docker/volumes/
You should see something similar to this:
$ docker run --rm -it -v /:/vm-root alpine:edge ls -latr /vm-root/var/lib/docker/volumes/
total 88
drwx--x--x 10 root root 4096 Aug 24 20:07 ..
drwxr-xr-x 3 root root 4096 Sep 19 21:25 9ab350e3947fc409819cc0924401d863fe84f5c45ea4243bcecf3e91a0741068
drwxr-xr-x 3 root root 4096 Sep 20 15:51 330351a101d34c3f0ed4f4ee7c3ef4277754a2cadd68d711e8e871aa09280e39
drwxr-xr-x 3 root root 4096 Sep 25 18:03 hadoop
drwxr-xr-x 3 root root 4096 Sep 28 21:13 ae64ecf489ceac45866a35b3babdf4773f67ba555acc5d45b1d52f9f305a964f
drwxr-xr-x 3 root root 4096 Sep 28 23:03 088a11867381704183ac9116ad3da0513c03885665e9e03049432363d2884d1e
drwxr-xr-x 3 root root 4096 Sep 28 23:17 f6f28886b2f50f72c52081dc2e9339678b9ecf4910564e14531c3ca6c8791974
drwxr-xr-x 3 root root 4096 Oct 5 13:45 c6825d9c9c6933549a446bf45924db641b65a632c18da662b15a109dc46b5f15
drwxr-xr-x 3 root root 4096 Oct 5 13:48 6ea352c744531d4c53e699df5eafde40100e4935c7398917714ed33ee7fe5f73
drwxr-xr-x 3 root root 4096 Oct 5 13:49 151490435ffcd759c266049b24cf3a18759c5fd3e26f1a05357973e318a8b117
drwxr-xr-x 3 root root 4096 Oct 5 13:50 a0575116e211d35d94ee648822a1bf035c708f90bf7e9620061753a3f34be150
-rw------- 1 root root 65536 Oct 7 18:46 metadata.db
drwx------ 14 root root 4096 Oct 7 18:46 .
Your output will not look exactly the same. The container ids listed will be different and you may not have the same number of containers. However, you should see the
hadoop directory in your output. Let's take a quick look inside it by modifying our previouis Docker command:
$ docker run --rm -it -v /:/vm-root alpine:edge ls -latr /vm-root/var/lib/docker/volumes/hadoop/_data
You should see something similar to this:
$ docker run --rm -it -v /:/vm-root alpine:edge ls -latr /vm-root/var/lib/docker/volumes/hadoop/_data
total 36
drwxr-xr-x 3 516 501 4096 Sep 13 10:54 zookeeper
drwxr-xr-x 3 513 501 4096 Sep 13 10:56 mapreduce
drwxr-xr-x 5 506 501 4096 Sep 13 10:56 hdfs
drwxr-xr-x 5 520 501 4096 Sep 13 10:58 yarn
drwxr-xr-x 3 506 501 4096 Sep 13 10:59 oozie
drwxr-xr-x 5 518 501 4096 Sep 13 11:02 falcon
drwxr-xr-x 3 root root 4096 Sep 25 18:03 ..
drwxr-xr-x 9 506 501 4096 Sep 28 20:36 .
drwxr-xr-x 7 510 501 4096 Oct 5 21:37 storm
As you can see, this where container is storing the data for the
/hadoop mount. The problem with this is that mount is the same for every container that runs that image using the run command we provided before.
We are going to modify how we create our containers so they each have a separate /hadoop mount. Create a new project directory
I like to create project directories. My Vagrant work goes under
~/Vagrant/<project> and my Docker work goes under ~/Docker/<project> . This allows me to cleary identify which technology or tool is associated with the projects and allows me to use various helper scripts to automate processes, etc. So let's create project directory for an notional Atlas demo.
$ mkdir -p ~/Docker/atlas-demo1 && cd ~/Docker/atlas-demo1
Create the project helper files
To make it easy to switch between containers and projects, I like to create 4 helper scripts. You can copy/paste the scripts as described below, or you can download them from the attachments section of this article.
create-container.sh
The first script is used to create the container: create-container.sh. In this script we'll be using a similar
docker run command as used in the previous tutorial. However, we are going to modify the mounts so they are no longer shared. The key change is we are doing grab the basename of our current project directory and use that name as our mount point instead of the "hard coded" hadoop.
We are also using the
basename of our project directory for the
--name of the container. In this case, the basename is atlas-demo1 . The last change you should notice is we have added a second -v flag. This addition mounts our local project directory to /mount within the container. This makes it really easy to copy data back and forth between our local directory and the container.
Edit the create-container.sh file
vi create-container.sh .
Copy and paste the following into your file:
#!/bin/bash
export CUR_DIR=`pwd`
export PROJ_DIR=`basename $CUR_DIR`
docker run -v `pwd`:/mount -v ${PROJ_DIR}:/hadoop --name ${PROJ_DIR} --hostname "sandbox.hortonworks.com" --privileged -d -p 6080:6080 -p 9090:9090 -p 9000:9000 -p 8000:8000 -p 8020:8020 -p 42111:42111 -p 10500:10500 -p 16030:16030 -p 8042:8042 -p 8040:8040 -p 2100:2100 -p 4200:4200 -p 4040:4040 -p 8050:8050 -p 9996:9996 -p 9995:9995 -p 8080:8080 -p 8088:8088 -p 8886:8886 -p 8889:8889 -p 8443:8443 -p 8744:8744 -p 8888:8888 -p 8188:8188 -p 8983:8983 -p 1000:1000 -p 1100:1100 -p 11000:11000 -p 10001:10001 -p 15000:15000 -p 10000:10000 -p 8993:8993 -p 1988:1988 -p 5007:5007 -p 50070:50070 -p 19888:19888 -p 16010:16010 -p 50111:50111 -p 50075:50075 -p 50095:50095 -p 18080:18080 -p 60000:60000 -p 8090:8090 -p 8091:8091 -p 8005:8005 -p 8086:8086 -p 8082:8082 -p 60080:60080 -p 8765:8765 -p 5011:5011 -p 6001:6001 -p 6003:6003 -p 6008:6008 -p 1220:1220 -p 21000:21000 -p 6188:6188 -p 61888:61888 -p 2181:2181 -p 2222:22 sandbox /usr/sbin/sshd -D
Now save your file with
:wq!
start-container.sh
The second script is used to start the container after it has been created. You start a container by using the
docker start <container> command where container is either the name or id. Instead of having to remember what the container name is, we'll have the script figure that out for us.
Edit the start-container.sh file
vi start-container.sh .
Copy and paste the following into your file:
#!/bin/bash
export CUR_DIR=`pwd`
export PROJ_DIR=`basename $CUR_DIR`
docker start ${PROJ_DIR}
Now save your file with
:wq!
stop-container.sh
The third script is used to stop the container after it has been created. You stop a container by using the
docker stop <container> command where container is either the name or id. Instead of having to remember what the container name is, we'll have the script figure that out for us.
Edit the stop-container.sh file
vi stop-container.sh .
Copy and paste the following into your file:
#!/bin/bash
export CUR_DIR=`pwd`
export PROJ_DIR=`basename $CUR_DIR`
docker stop ${PROJ_DIR}
Now save your file with
:wq! ssh-container.sh
The fourth script is used to ssh into the container. The container maps the local host port
2222 to the container port 22 via the -p 2222:22 line in the create-container.sh script. Admittedly the ssh command to connect is simple. However this script means I don't have to think about it very much. Edit the ssh-container.sh file vi ssh-container.sh .
Copy and paste the following into your file:
#!/bin/bash
ssh -p 2222 root@localhost
Now save your file with
:wq! Create the atlas-demo1 container
Now that we have our helper scripts ready to go, let's create the container for our notional Atlas demo.
$ cd ~/Docker/atlas-demo1
$ ./create-container.sh
You should see something similar to the following:
$ ./create-container.sh
9366e0b23a72ea53581647e174b50e5d24ec08a217c1bf3591491ad74ab18028
The output of the docker run command is the unique container id for our
atlas-demo1 container. You can verify the container is running with the docker ps command:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9366e0b23a72 sandbox "/usr/sbin/sshd -D" 55 seconds ago Up 53 seconds 0.0.0.0:1000->1000/tcp, 0.0.0.0:1100->1100/tcp, 0.0.0.0:1220->1220/tcp, 0.0.0.0:1988->1988/tcp, 0.0.0.0:2100->2100/tcp, 0.0.0.0:2181->2181/tcp, 0.0.0.0:4040->4040/tcp, 0.0.0.0:4200->4200/tcp, 0.0.0.0:5007->5007/tcp, 0.0.0.0:5011->5011/tcp, 0.0.0.0:6001->6001/tcp, 0.0.0.0:6003->6003/tcp, 0.0.0.0:6008->6008/tcp, 0.0.0.0:6080->6080/tcp, 0.0.0.0:6188->6188/tcp, 0.0.0.0:8000->8000/tcp, 0.0.0.0:8005->8005/tcp, 0.0.0.0:8020->8020/tcp, 0.0.0.0:8040->8040/tcp, 0.0.0.0:8042->8042/tcp, 0.0.0.0:8050->8050/tcp, 0.0.0.0:8080->8080/tcp, 0.0.0.0:8082->8082/tcp, 0.0.0.0:8086->8086/tcp, 0.0.0.0:8088->8088/tcp, 0.0.0.0:8090-8091->8090-8091/tcp, 0.0.0.0:8188->8188/tcp, 0.0.0.0:8443->8443/tcp, 0.0.0.0:8744->8744/tcp, 0.0.0.0:8765->8765/tcp, 0.0.0.0:8886->8886/tcp, 0.0.0.0:8888-8889->8888-8889/tcp, 0.0.0.0:8983->8983/tcp, 0.0.0.0:8993->8993/tcp, 0.0.0.0:9000->9000/tcp, 0.0.0.0:9090->9090/tcp, 0.0.0.0:9995-9996->9995-9996/tcp, 0.0.0.0:10000-10001->10000-10001/tcp, 0.0.0.0:10500->10500/tcp, 0.0.0.0:11000->11000/tcp, 0.0.0.0:15000->15000/tcp, 0.0.0.0:16010->16010/tcp, 0.0.0.0:16030->16030/tcp, 0.0.0.0:18080->18080/tcp, 0.0.0.0:19888->19888/tcp, 0.0.0.0:21000->21000/tcp, 0.0.0.0:42111->42111/tcp, 0.0.0.0:50070->50070/tcp, 0.0.0.0:50075->50075/tcp, 0.0.0.0:50095->50095/tcp, 0.0.0.0:50111->50111/tcp, 0.0.0.0:60000->60000/tcp, 0.0.0.0:60080->60080/tcp, 0.0.0.0:61888->61888/tcp, 0.0.0.0:2222->22/tcp atlas-demo1
You should notice the shortened version of the container id is listed as
9366e0b23a72 . It is the first 12 charactrers, and it matches the output of our create-container.sh command. Your container id value will be different. You should also notice the name of the container is listed as atlas-demo1 .
When you create a container with
docker run it starts it for you. That means you can connect to it without having to run the start-container.sh script. After the container has been stopped, you will need to run start-container.sh to bring it up, NOT create-container.sh . Connect to the atlas-demo1 container
Now that the container is started, we can connect to it. We can use our new helper script
ssh-container.sh to make it easy:
$ ./ssh-container.sh
You should be prompted for a password. The default password on the sandbox is
hadoop . The first time you start log into a new container you will be prompted to change the password. You should see something similar to this:
$ ./ssh-container.sh
root@localhost's password:
You are required to change your password immediately (root enforced)
Last login: Thu Sep 22 11:35:09 2016 from 172.17.0.1
Changing password for root.
(current) UNIX password:
New password:
Retype new password:
For demo purposes, I temporarily change it something new like
trymenow and then change it back to hadoop .
[root@sandbox ~]# passwd
Changing password for user root.
New password:
BAD PASSWORD: is too simple
Retype new password:
passwd: all authentication tokens updated successfully.
Verify container mounts
Let's verify our container mounts. You do this with the
df command:
[root@sandbox ~]# df -h
Filesystem Size Used Avail Use% Mounted on
none 60G 32G 25G 57% /
tmpfs 5.9G 0 5.9G 0% /dev
tmpfs 5.9G 0 5.9G 0% /sys/fs/cgroup
/dev/vda2 60G 32G 25G 57% /hadoop
/dev/vda2 60G 32G 25G 57% /etc/resolv.conf
/dev/vda2 60G 32G 25G 57% /etc/hostname
/dev/vda2 60G 32G 25G 57% /etc/hosts
shm 64M 8.0K 64M 1% /dev/shm
osxfs 233T 33T 201T 15% /Users/myoung/Documents/Docker/atlas-demo1
The first thing you should notice is the last entry. My local project directory is mounted as
osxfs . Let's ls the /mount directory to see what's there:
[root@sandbox ~]# ls -la /Users/myoung/Documents/Docker/atlas-demo1
total 300
drwxr-xr-x 12 root root 408 Oct 7 22:52 .
drwxr-xr-x 3 root root 4096 Oct 7 22:57 ..
-rwxrwxr-x 1 root root 1199 Oct 7 23:31 create-container.sh
-rwxrwxr-x 1 root root 40 Oct 7 22:52 ssh-container.sh
-rwxrwxr-x 1 root root 96 Oct 7 22:48 start-container.sh
-rwxrwxr-x 1 root root 95 Oct 7 22:48 stop-container.sh
You should see the 4 helper scripts we created. If I want to easily make data available to the container, all I have to do is copy the data to my project directory. Start the sandbox processes
When the container starts up, it doesn't automatically start the sandbox processes. You can do that by running the
/etc/inid./startup_script . You should see something similar to this:
[root@sandbox ~]# /etc/init.d/startup_script start
Starting tutorials... [ Ok ]
Starting startup_script...
Starting HDP ...
Starting mysql [ OK ]
Starting Flume [ OK ]
Starting Postgre SQL [ OK ]
Starting Ranger-admin [WARNINGS]
find: failed to restore initial working directory: Permission denied
Starting data node [ OK ]
Starting name node [ OK ]
Safe mode is OFF
Starting Oozie [ OK ]
Starting Ranger-usersync [ OK ]
Starting Zookeeper nodes [ OK ]
Starting NFS portmap [ OK ]
Starting Hdfs nfs [ OK ]
Starting Hive server [ OK ]
Starting Hiveserver2 [ OK ]
Starting Ambari server [ OK ]
Starting Ambari agent [ OK ]
Starting Node manager [ OK ]
Starting Yarn history server [ OK ]
Starting Webhcat server [ OK ]
Starting Spark [ OK ]
Starting Mapred history server [ OK ]
Starting Zeppelin [ OK ]
Starting Resource manager [ OK ]
Safe mode is OFF
Starting sandbox...
/etc/init.d/startup_script: line 97: /proc/sys/kernel/hung_task_timeout_secs: No such file or directory
Starting shellinaboxd: [ OK ]
NOTE: You can ignore any warnings or errors that are displayed.
Now the sandbox processes are running and you can access the Ambari interface via
http://localhost:8080 . Log in with the raj_ops username and password. You should see something similar to this:
Enable HBase
We are going to start the HBase service and turn off maintenance mode. We want to compare this sandbox with another one we will start later to show the services are different.
Click on the HBase service. The HBase summary page will be displayed. Click the Services button and select the
Start menu option. You should see something simiarl to this:
A confirmation dialog will be displayed. Check the
Turn Off Maintenance Mode for HBase and then click the green Confirm Start button.
The Background Operation Running dialog will be displayed. You should see something similar to this:
You can click the green
OK button.
Once HBase is running, you should see something similar to this:
You should notice that HBase is running and is no longer in maintenance mode. Upload file to HDFS home directory
We are going to upload a file to the user home directory on HDFS. As mentioned in the previous section, we want to compare this sandbox with another to show the directories are different.
Click on the Ambari Views menu in the upper right menu. A drop down menu will be displayed. You should see something similar to this:
Click on the
Files View option. You should see something similar to this:
We are going to navigate to our user home directory. We are logged in as
raj_ops . So click on the user folder, then the raj_ops folder. You should see something similar to this:
Now we are going to upload a file. Click on the blue
Upload button. You should see something similar to this:
Click the cloud-arrow icon. You should see a file dialog box that looks simlar to this:
You should be in your project directory. If you are not, nagivate it that location until you see the project helper files we create. We are going to upload the start-container.sh script. Select the file and then click the
open button. You should see something similar to this:
Stop the atlas-demo1 container
Now we are going to stop our container. Before stopping it, use Ambari to
Stop All services. You can find that link on the Ambari Dashboard:
You stop your container by running the
stop-container.sh script on the local host machine.
[root@sandbox ~]# exit
logout
Connection to localhost closed.
$ ./stop-container.sh
atlas-demo1
When you stop or start a container, Docker will always print the name of the container when it the command completes. Create the atlas-demo2 container
Now let's create a new project directory for comparison. This will show that our two containers are not sharing configurations.
$ mkdir ~/Docker/atlas-demo2 && cd ~/Docker/atlas-demo2
Copy helper scripts
There is no reason to copy/paste those helper scripts again. The scripts we created will work anywhere. So let's copy them.
$ cp ~/Docker/atlas-demo1/* .
$ ls
create-container.sh ssh-container.sh start-container.sh stop-container.sh
Create the atlas-demo2 container
This is a new container, so we need to run the
create-container.sh script.
You should see something similar to the following:
$ ./create-container.sh
05e4710f3aaa1232b620a5d908003070a7b3d991c064ac09c04571a2fc1b2079
The output of the docker run command is the unique container id for our
atlas-demo2 container. You can verify the container is running with the docker ps command:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
05e4710f3aaa sandbox "/usr/sbin/sshd -D" About a minute ago Up 33 seconds 0.0.0.0:1000->1000/tcp, 0.0.0.0:1100->1100/tcp, 0.0.0.0:1220->1220/tcp, 0.0.0.0:1988->1988/tcp, 0.0.0.0:2100->2100/tcp, 0.0.0.0:2181->2181/tcp, 0.0.0.0:4040->4040/tcp, 0.0.0.0:4200->4200/tcp, 0.0.0.0:5007->5007/tcp, 0.0.0.0:5011->5011/tcp, 0.0.0.0:6001->6001/tcp, 0.0.0.0:6003->6003/tcp, 0.0.0.0:6008->6008/tcp, 0.0.0.0:6080->6080/tcp, 0.0.0.0:6188->6188/tcp, 0.0.0.0:8000->8000/tcp, 0.0.0.0:8005->8005/tcp, 0.0.0.0:8020->8020/tcp, 0.0.0.0:8040->8040/tcp, 0.0.0.0:8042->8042/tcp, 0.0.0.0:8050->8050/tcp, 0.0.0.0:8080->8080/tcp, 0.0.0.0:8082->8082/tcp, 0.0.0.0:8086->8086/tcp, 0.0.0.0:8088->8088/tcp, 0.0.0.0:8090-8091->8090-8091/tcp, 0.0.0.0:8188->8188/tcp, 0.0.0.0:8443->8443/tcp, 0.0.0.0:8744->8744/tcp, 0.0.0.0:8765->8765/tcp, 0.0.0.0:8886->8886/tcp, 0.0.0.0:8888-8889->8888-8889/tcp, 0.0.0.0:8983->8983/tcp, 0.0.0.0:8993->8993/tcp, 0.0.0.0:9000->9000/tcp, 0.0.0.0:9090->9090/tcp, 0.0.0.0:9995-9996->9995-9996/tcp, 0.0.0.0:10000-10001->10000-10001/tcp, 0.0.0.0:10500->10500/tcp, 0.0.0.0:11000->11000/tcp, 0.0.0.0:15000->15000/tcp, 0.0.0.0:16010->16010/tcp, 0.0.0.0:16030->16030/tcp, 0.0.0.0:18080->18080/tcp, 0.0.0.0:19888->19888/tcp, 0.0.0.0:21000->21000/tcp, 0.0.0.0:42111->42111/tcp, 0.0.0.0:50070->50070/tcp, 0.0.0.0:50075->50075/tcp, 0.0.0.0:50095->50095/tcp, 0.0.0.0:50111->50111/tcp, 0.0.0.0:60000->60000/tcp, 0.0.0.0:60080->60080/tcp, 0.0.0.0:61888->61888/tcp, 0.0.0.0:2222->22/tcp atlas-demo2
You should notice the shortened version of the container id is listed as
05e4710f3aaa . As before, this id matches the first 12 charactrers, and it matches the output of our create-container.sh command. Your container id value will be different. You should also notice the name of the container is listed as atlas-demo2 . Connect to the atlas-demo2 container
Now that the container is started, we can connect to it. We can use our new helper script
ssh-container.sh to make it easy:
$ ./ssh-container.sh
Because this is a new container, you should be prompted for a password. Change the password as you did with
atlas-demo1 . Verify container mounts
Let's verify our container mounts. You do this with the
df command:
[root@sandbox ~]# df -h
Filesystem Size Used Avail Use% Mounted on
none 60G 32G 25G 57% /
tmpfs 5.9G 0 5.9G 0% /dev
tmpfs 5.9G 0 5.9G 0% /sys/fs/cgroup
/dev/vda2 60G 32G 25G 57% /hadoop
/dev/vda2 60G 32G 25G 57% /etc/resolv.conf
/dev/vda2 60G 32G 25G 57% /etc/hostname
/dev/vda2 60G 32G 25G 57% /etc/hosts
shm 64M 8.0K 64M 1% /dev/shm
osxfs 233T 33T 201T 15% /Users/myoung/Documents/Docker/atlas-demo1
The first thing you should notice is the last entry. My local project directory is mounted as
osxfs . Let's ls the /mount directory to see what's there:
[root@sandbox ~]# ls -la /Users/myoung/Documents/Docker/atlas-demo2
total 300
drwxr-xr-x 12 root root 408 Oct 7 22:52 .
drwxr-xr-x 3 root root 4096 Oct 7 22:57 ..
-rwxrwxr-x 1 root root 1199 Oct 7 23:31 create-container.sh
-rwxrwxr-x 1 root root 40 Oct 7 22:52 ssh-container.sh
-rwxrwxr-x 1 root root 96 Oct 7 22:48 start-container.sh
-rwxrwxr-x 1 root root 95 Oct 7 22:48 stop-container.sh
As before, you should see the 4 helper scripts we created. Start the sandbox processes
When the container starts up, it doesn't automatically start the sandbox processes. You can do that by running the
/etc/inid./startup_script . You should see something similar to this:
[root@sandbox ~]# /etc/init.d/startup_script start
Starting tutorials... [ Ok ]
Starting startup_script...
Starting HDP ...
Starting mysql [ OK ]
Starting Flume [ OK ]
Starting Postgre SQL [ OK ]
Starting Ranger-admin [WARNINGS]
find: failed to restore initial working directory: Permission denied
Starting data node [ OK ]
Starting name node [ OK ]
Safe mode is OFF
Starting Oozie [ OK ]
Starting Ranger-usersync [ OK ]
Starting Zookeeper nodes [ OK ]
Starting NFS portmap [ OK ]
Starting Hdfs nfs [ OK ]
Starting Hive server [ OK ]
Starting Hiveserver2 [ OK ]
Starting Ambari server [ OK ]
Starting Ambari agent [ OK ]
Starting Node manager [ OK ]
Starting Yarn history server [ OK ]
Starting Webhcat server [ OK ]
Starting Spark [ OK ]
Starting Mapred history server [ OK ]
Starting Zeppelin [ OK ]
Starting Resource manager [ OK ]
Safe mode is OFF
Starting sandbox...
/etc/init.d/startup_script: line 97: /proc/sys/kernel/hung_task_timeout_secs: No such file or directory
Starting shellinaboxd: [ OK ]
NOTE: You can ignore any warnings or errors that are displayed. Check Ambari Services
We are going to look at the services in Ambari. In the old container we turned off maintenance mode. Login with the
raj_ops username and password.
You should see something similar to this:
You should notice that the HBase service has maintenance mode turn on. Check HDFS home directory
Now nagivate the
raj_ops HDFS home directory using the Ambari Files View. Follow the process above up to get to the home directory. You should see something similar to this:
Notice the file we uploaded in the other container is not here. Stop the atlas-demo2 container
Now we are going to stop our container. Before stopping it, use Ambari to
Stop All services as you did before. Then you run the stop-container.sh script:
You stop your container by running the
stop-container.sh script on the local host machine.
[root@sandbox ~]# exit
logout
Connection to localhost closed.
$ ./stop-container.sh
atlas-demo2
Starting created containers
As mentioned above, the create process will autostart the containers. After you stop them, you need to run the
start-container.sh script, which simply runs docker start <container> .
$ ./start-container.sh
atlas-demo2
Again, the Docker start command will print the name of the container when it completes. Deleting containers
If you decide you no longer need a container, you can easily delete it. Before you ca delete the container, you need to stop it first. Once it is stopped, you us the
docker rm command:
$ docker rm atlas-demo1
atlas-demo1
As with the start and stop command, the
rm command will print the name of the container when the command completes.
If the container is not running, the docker command will display the following:
$ docker stop atlas-demo1
Error response from daemon: No such container: atlas-demo1
That means the container is already stopped and can be deleted Note on disk utilization
While the containers do not share configurations, they all run on the same Docker virtual machine. This means that you should properly manage the number of containers you are using as the storage space of the VM will become an issue.
Here is a quick screenshot of my disk usage in Ambari: hdfs-1.png hdfs-2.png
Let's see what your disk usage looks like at the command line:
$ docker run --rm -it -v /:/vm-root alpine:edge df -h /
Filesystem Size Used Available Use% Mounted on
none 59.0G 33.8G 22.2G 60% /
I'm going to delete the two atlas demo containers to see if that changes my disk utilization.
$ docker rm atlas-demo1
atlas-demo1
$ docker rm atlas-demo2
atlas-demo2
Now let's look at my disk utilization:
$ docker run --rm -it -v /:/vm-root alpine:edge df -h /
Filesystem Size Used Available Use% Mounted on
none 59.0G 33.1G 22.9G 59% /
It looks like I freed up about 600MB of space. As you add and remove containers, just be sure to keep an eye on your overall disk utilization. The space reported by HDFS in Ambari for your sandbox containers should closely reflect the VM disk space as seen here:
Review
If you successfully followed along with this tutorial, you now have an easy way to create HDP Docker based sandboxes that don't share configuration. You have a few scripts to make the management process easier. You can read more about Docker container storage here: Docker Volumes
... View more
01-12-2017
05:24 AM
However I am indexing is failing in monit, any suggestions for that? thanks!
... View more
10-26-2016
01:39 AM
Did you install the vagrant plugin "vagrant-hostmanager"? It is listed a requirement at the top of the tutorial.
... View more
12-01-2017
05:55 AM
whats the difference between, when i entering (elasticsearch,solr ) in Terms to Filter On inGET TWITTER processor and this article for creating a process group
... View more
08-30-2018
09:00 PM
I have similar case, however, Elasticsearch understanded timestamp_ms as string. Do you know how to fix it ??? Many thanks
... View more
09-19-2016
11:15 PM
1 Kudo
Objectives This tutorial will walk you the process of starting Atlas on the Hortonworks Sandbox for HDP 2.5. By default the service is disabled. However manually starting the service will fail unless you start the dependencies. Atlas depends on Ambari Infra (which provides Solr), Kafka and HBase. Atlas will start with just the Ambari Infra service running, you won't have proper functionality without Kafka and KBase Scope This has been testing on the following:
VirtualBox 5.1.6 Hortonworks Sandbox for HDP 2.5 Steps Start Atlas Service Get your sandbox up and running and log into Ambari. Click on the Atlas service link. You should see something similar to this: Because Atlas is in maintenance mode, it will not automatically start. When you try to start it by going to Service Actions -> Start like this: You will see the following error: If you look at the error message provide you will see the problem is related to Solr: Client is connected to ZooKeeper
Using default ZkACLProvider
Updating cluster state from ZooKeeper...
No live SolrServers available to handle this request
org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request
at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:350)
at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1100)
at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:870)
at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:806)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:166) This is because Atlas is using the Solr instance in the Ambari Infra service, which is in maintenance mode and does not auto start. Let's start the service. Start Ambari Infra Service If you click on the Ambari Infra Service you should see something like this: Click the Service Actions -> Start button. This should start Ambari Infra. Start Atlas Service Now that Ambari Infra is running, you should be able to start the Atlas service. Start Kafka and HBase Service While Atlas did start with only the Ambari Infra service running. It also depends on Kafka and HBase for full functionality. You should start both of those services similar to how we started the Ambari Infra service. Review The Ambari Infra service provides a Solr instance for core HDP component access. By default this service is in maintenance mode and does not start which causes the Atlas service to fail. By starting the Amabari Infra service before Atlas, you will be able to start Atlas. If you turn off maintanence mode for Ambari Infra, then it will auto start.
... View more
- « Previous
- Next »