Member since
02-09-2016
559
Posts
422
Kudos Received
98
Solutions
05-12-2017
10:17 PM
17 Kudos
Note: A newer version of this article is available here: https://community.hortonworks.com/articles/194076/using-vagrant-and-virtualbox-to-create-a-local-ins.html Objectives
This tutorial is designed to walk you through the process of using Vagrant and Virtualbox to create a local instance of Cloudbreak. This will allow you to start your Cloudbreak deployer when you want to spin up an HDP cluster on the cloud without incurring costs associated with hosting your Cloudbreak instance on the cloud itself. Prerequisites
You should already have installed VirtualBox 5.1.x. Read more here: VirtualBox You should already have installed Vagrant 1.9.x. Read more here: Vagrant You should already have installed the vagrant-vbguest plugin. This plugin will keep the VirtualBox Guest Additions software current as you upgrade your kernel and/or VirtualBox versions. Read more here: vagrant-vbguest You should already have installed the vagrant-hostmanager plugin. This plugin will automatically manage the /etc/hosts file on your local computer and in your virtual machines. Read more here: vagrant-hostmanager Scope
This tutorial was tested in the following environment:
macOS Sierra (version 10.12.4) VirtualBox 5.1.22 Vagrant 1.9.4 vagrant-vbguest plugin 0.14.1 vagrant-hostnamanger plugin 1.8.6 Cloudbreak 1.14.0 TP Steps Setup Vagrant Create Vagrant project directory
Before we get started, determine where you want to keep your Vagrant project files. Each Vagrant project should have its own directory. I keep my Vagrant projects in my ~/Development/Vagrant directory. You should also use a helpful name for each Vagrant project directory you create.
$ cd ~/Development/Vagrant
$ mkdir centos7-cloudbreak
$ cd centos7-cloudbreak
We will be using a CentOS 7.3 Vagrant box, so I include centos7 in the Vagrant project name to differentiate it from a CentOS 6 project. The project is for cloudbreak, so I include that in the name. Create Vagrantfile
The Vagrantfile tells Vagrant how to configure your virtual machines. You can copy/paste my Vagrantfile below:
# -*- mode: ruby -*-
# vi: set ft=ruby :
# Using yaml to load external configuration files
require 'yaml'
Vagrant.configure("2") do |config|
# Using the hostmanager vagrant plugin to update the host files
config.hostmanager.enabled = true
config.hostmanager.manage_host = true
config.hostmanager.manage_guest = true
config.hostmanager.ignore_private_ip = false
# Loading in the list of commands that should be run when the VM is provisioned.
commands = YAML.load_file('commands.yaml')
commands.each do |command|
config.vm.provision :shell, inline: command
end
# Loading in the VM configuration information
servers = YAML.load_file('servers.yaml')
servers.each do |servers|
config.vm.define servers["name"] do |srv|
srv.vm.box = servers["box"] # Speciy the name of the Vagrant box file to use
srv.vm.hostname = servers["name"] # Set the hostname of the VM
srv.vm.network "private_network", ip: servers["ip"], :adapater=>2 # Add a second adapater with a specified IP
srv.vm.provision :shell, inline: "sed -i'' '/^127.0.0.1\t#{srv.vm.hostname}\t#{srv.vm.hostname}$/d' /etc/hosts" # Remove the extraneous first entry in /etc/hosts
srv.vm.provider :virtualbox do |vb|
vb.name = servers["name"] # Name of the VM in VirtualBox
vb.cpus = servers["cpus"] # How many CPUs to allocate to the VM
vb.memory = servers["ram"] # How much memory to allocate to the VM
end
end
end
end
Create a servers.yaml file
The servers.yaml file contains the configuration information for our VMs. Here is the content from my file:
---
- name: cloudbreak
box: bento/centos-7.3
cpus: 2
ram: 4096
ip: 192.168.56.100
NOTE: You may need to modify the IP address to avoid conflicts with your local network. Create commands.yaml file
The commands.yaml file contains the list of commands that should be run on each VM when they are first provisioned. This allows us to automate configuration tasks that would other wise be tedious and/or repetitive. Here is the content from my file:
- "sudo yum -y update"
- "sudo yum -y install net-tools ntp wget lsof unzip tar iptables-services"
- "sudo systemctl enable ntpd && sudo systemctl start ntpd"
- "sudo systemctl disable firewalld && sudo systemctl stop firewalld"
- "sudo iptables --flush INPUT && sudo iptables --flush FORWARD && sudo service iptables save"
- "sudo sed -i --follow-symlinks 's/^SELINUX=.*/SELINUX=disabled/g' /etc/sysconfig/selinux"
Start Virtual Machines
Once you have created the 3 files in your Vagrant project directory, you are ready to start your cluster. Creating the cluster for the first time and starting it every time after that uses the same command:
$ vagrant up
You should notice Vagrant automatically updating the packages on the VM.
Once the process is complete you should have 1 servers running. You can verify by looking at the Virtualbox UI where you should see the cloudbreak VM running. You should see something similar to this:
Connect to each virtual machine
You are able to login to the VM via ssh using the vagrant ssh command.
$ vagrant ssh
[vagrant@cloudbreak ~]$
Install Cloudbreak
Most of the Cloudbreak installation is covered well in the docs:
Cloudbreak Install Docs. However, the first couple of steps in the docs has you install a few packages, change iptables settings, etc. That part of the install is actually handled by the Vagrant provisioning step, so you can skip those steps. You should be able to start at the Docker Service section of the docs.
We need to be root for most of this, so we'll use sudo.
sudo -i
Create Docker Repo
We need to add a repo so we can install Docker.
cat > /etc/yum.repos.d/docker.repo <<"EOF"
[dockerrepo]
name=Docker Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/7
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg
EOF
Install Docker Service
Now we need to install Docker and enable the service.
yum install -y docker-engine-1.9.1 docker-engine-selinux-1.9.1
systemctl start docker
systemctl enable docker
Install Cloudbreak Deployer
Now we can install Cloudbreak itself.
yum -y install unzip tar
curl -Ls s3.amazonaws.com/public-repo-1.hortonworks.com/HDP/cloudbreak/cloudbreak-deployer_1.14.0_$(uname)_x86_64.tgz | sudo tar -xz -C /bin cbd
Once the Cloudbreak Deployer is installed, you can check the version of the install software.
cbd --version
You should see something similar to this:
[root@cloudbreak cloudbreak-deployment]# cbd --version
Cloudbreak Deployer: 1.14.0
NOTE: Notice that we are installing version 1.14.0. You may want to consider installing the latest version, which is 1.16.1 as of August 2017. Create Cloudbreak Profile
You should make a Cloudbreak application directory. This is where the Cloudbreak configuration files and logs will be located.
cd /opt
mkdir cloudbreak-deployment
cd cloudbreak-deployment
Now you need to setup the Profile file. This file contains environment variables that determines how Cloudbreak runs. Edit Profile using your editor of choice.
I recommend the following settings for your profile:
export UAA_DEFAULT_SECRET='[SECRET]'
export UAA_DEFAULT_USER_EMAIL='<myemail>'
export UAA_DEFAULT_USER_PW='<mypassword>'
export PUBLIC_IP=192.168.56.100
export CLOUDBREAK_SMTP_SENDER_USERNAME='<myemail>'
export CLOUDBREAK_SMTP_SENDER_PASSWORD='<mypassword>'
export CLOUDBREAK_SMTP_SENDER_HOST='smtp.gmail.com'
export CLOUDBREAK_SMTP_SENDER_PORT=25
export CLOUDBREAK_SMTP_SENDER_FROM='<myemail>'
export CLOUDBREAK_SMTP_AUTH=true
export CLOUDBREAK_SMTP_STARTTLS_ENABLE=true
export CLOUDBREAK_SMTP_TYPE=smtp
You should set the UAA_DEFAULT_USER_EMAIL variable to the email address you want to use. This is the account you will use to login to Cloudbreak. You should set the UAA_DEFAULT_USER_PW variable to the password you want to use. This is the password you will use to login to Cloudbreak.
You should set the CLOUDBREAK_SMTP_SENDER_USERNAME variable to the username you use to authenticate to your SMTP server. You should set the CLOUDBREAK_SMTP_SENDER_PASSWORD variable to the password you use to authenticate to your SMTP server.
NOTE: The SMTP variables are how you enable Cloudbreak to send you an email when the cluster operations are done. This is optional and is only required if you want to use the checkbox to get emails when you build a cluster. The example above assumes you are using GMail. You should use the settings appropriate for your SMTP server. Initialize Cloudbreak Configuration
Now that you have a profile, you can initialize your Cloudbreak configuration files.
cbd generate
You should see something similar to this:
[root@cloudbreak cloudbreak-deployment]# cbd generate
* Dependency required, installing sed latest ...
* Dependency required, installing jq latest ...
* Dependency required, installing docker-compose 1.9.0 ...
* Dependency required, installing aws latest ...
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
03310923a82b: Pulling fs layer
6fc6c6aca926: Pulling fs layer
6fc6c6aca926: Verifying Checksum
6fc6c6aca926: Download complete
03310923a82b: Verifying Checksum
03310923a82b: Download complete
03310923a82b: Pull complete
6fc6c6aca926: Pull complete
Digest: sha256:7875e46eb14555e893e7c23a7f90a0d2396f6b56c8c3dcf68f9ed14879b8966c
Status: Downloaded newer image for alpine:latest
Generating Cloudbreak client certificate and private key in /opt/cloudbreak-deployment/certs.
generating docker-compose.yml
generating uaa.yml
[root@cloudbreak cloudbreak-deployment]#
Start Cloudbreak Deployer
You should be able to start the Cloudbreak Deployer application. This process will first pull down the Docker images used by Cloudbreak.
cbd pull
cbd start
You should notice a bunch of images being pulled down:
[root@cloudbreak cloudbreak-deployment]# cbd start
generating docker-compose.yml
generating uaa.yml
Pulling haveged (hortonworks/haveged:1.1.0)...
1.1.0: Pulling from hortonworks/haveged
ca26f34d4b27: Pull complete
bf22b160fa79: Pull complete
d30591ea011f: Pull complete
22615e74c8e4: Pull complete
ceb5854e0233: Pull complete
Digest: sha256:09f8cf4f89b59fe2b391747181469965ad27cd751dad0efa0ad1c89450455626
Status: Downloaded newer image for hortonworks/haveged:1.1.0
Pulling uluwatu (hortonworks/cloudbreak-web:1.14.0)...
1.14.0: Pulling from hortonworks/cloudbreak-web
16e32a1a6529: Pull complete
8e153fce9343: Pull complete
6af1e6403bfe: Pull complete
075e3418c7e0: Pull complete
9d8191b4be57: Pull complete
38e38dfe826c: Pull complete
d5d08e4bc6be: Pull complete
955b472e3e42: Pull complete
02e1b573b380: Pull complete
Digest: sha256:06ceb74789aa8a78b9dfe92872c45e045d7638cdc274ed9b0cdf00b74d118fa2
...
Creating cbreak_periscope_1
Creating cbreak_logsink_1
Creating cbreak_identity_1
Creating cbreak_uluwatu_1
Creating cbreak_haveged_1
Creating cbreak_consul_1
Creating cbreak_mail_1
Creating cbreak_pcdb_1
Creating cbreak_uaadb_1
Creating cbreak_cbdb_1
Creating cbreak_sultans_1
Creating cbreak_registrator_1
Creating cbreak_logspout_1
Creating cbreak_cloudbreak_1
Creating cbreak_traefik_1
Uluwatu (Cloudbreak UI) url:
https://192.168.56.100
login email:
<myemail>
password:
****
creating config file for hdc cli: /root/.hdc/config
The start command will output the IP address and the username to login which is based on what we setup in the Profile. Check Cloudbreak Logs
You can always look at the Cloudbreak logs in /opt/cloudbreak-deployment/cbreak.log. You can also use the cbd logs cloudbreak command to view logs in real time. Cloudbreak is ready to use when you see a message similar to Started CloudbreakApplication in 64.156 seconds (JVM running for 72.52). Login to Cloudbreak
Cloudbreak should now be running. We can login to the UI using the IP address specified in the Profile. In our case that is https://192.168.56.100. Notice Cloudbreak uses https.
You should see a login screen similar to this:
At this point you should be able the Cloudbreak UI screen where you can manage your credentials, blueprints, etc. This tutorial doesn't cover setting up credentials or deploying a cluster. Before you can deploy a cluster you need to setup a platform and credentials. See this link for setting up your credentials:
AWS: Cloudbreak AWS Credentisl Azure: Cloudbreak Azure Credentials Stopping Cloudbreak
When you are ready to shutdown Cloudbeak, the process is simple. First you need to stop the Cloudbreak deployer:
$ cbd kill
You should see something similar to this:
[root@cloudbreak cloudbreak-deployment]# cbd kill
Stopping cbreak_traefik_1 ... done
Stopping cbreak_cloudbreak_1 ... done
Stopping cbreak_logspout_1 ... done
Stopping cbreak_registrator_1 ... done
Stopping cbreak_sultans_1 ... done
Stopping cbreak_uaadb_1 ... done
Stopping cbreak_cbdb_1 ... done
Stopping cbreak_pcdb_1 ... done
Stopping cbreak_mail_1 ... done
Stopping cbreak_haveged_1 ... done
Stopping cbreak_consul_1 ... done
Stopping cbreak_uluwatu_1 ... done
Stopping cbreak_identity_1 ... done
Stopping cbreak_logsink_1 ... done
Stopping cbreak_periscope_1 ... done
Going to remove cbreak_traefik_1, cbreak_cloudbreak_1, cbreak_logspout_1, cbreak_registrator_1, cbreak_sultans_1, cbreak_uaadb_1, cbreak_cbdb_1, cbreak_pcdb_1, cbreak_mail_1, cbreak_haveged_1, cbreak_consul_1, cbreak_uluwatu_1, cbreak_identity_1, cbreak_logsink_1, cbreak_periscope_1
Removing cbreak_traefik_1 ... done
Removing cbreak_cloudbreak_1 ... done
Removing cbreak_logspout_1 ... done
Removing cbreak_registrator_1 ... done
Removing cbreak_sultans_1 ... done
Removing cbreak_uaadb_1 ... done
Removing cbreak_cbdb_1 ... done
Removing cbreak_pcdb_1 ... done
Removing cbreak_mail_1 ... done
Removing cbreak_haveged_1 ... done
Removing cbreak_consul_1 ... done
Removing cbreak_uluwatu_1 ... done
Removing cbreak_identity_1 ... done
Removing cbreak_logsink_1 ... done
Removing cbreak_periscope_1 ... done
[root@cloudbreak cloudbreak-deployment]#
Now exit the Vagrant box:
[root@cloudbreak cloudbreak-deployment]# exit
logout
[vagrant@cloudbreak ~]$ exit
logout
Connection to 127.0.0.1 closed.
Now we can shutdown the Vagrant box
$ vagrant halt
==> cbtest: Attempting graceful shutdown of VM...
Starting Cloudbreak
To startup Cloudbreak, the process is the opposite of stopping it. First you need to start the Vagrant box:
$ vagrant up
Once the Vagrant box is up, you need to ssh in to the box:
$ vagrant ssh
You need to be root:
$ sudo -i
Now start Cloudbreak:
$ cd /opt/cloudbreak-deployment
$ cbd start
You should see something similar to this:
[root@cloudbreak cloudbreak-deployment]# cbd start
generating docker-compose.yml
generating uaa.yml
Creating cbreak_consul_1
Creating cbreak_periscope_1
Creating cbreak_sultans_1
Creating cbreak_uluwatu_1
Creating cbreak_identity_1
Creating cbreak_uaadb_1
Creating cbreak_pcdb_1
Creating cbreak_mail_1
Creating cbreak_haveged_1
Creating cbreak_logsink_1
Creating cbreak_cbdb_1
Creating cbreak_logspout_1
Creating cbreak_registrator_1
Creating cbreak_cloudbreak_1
Creating cbreak_traefik_1
Uluwatu (Cloudbreak UI) url:
https://192.168.56.100
login email:
<myemail>
password:
****
creating config file for hdc cli: /root/.hdc/config
[root@cloudbreak cloudbreak-deployment]#
It takes a minute or two for the Cloudbreak application to fully start up. Now you can login to the Cloudbreak UI. Review
If you have successfully followed along with this tutorial, you should now have a Vagrant box you can spin up via vagrant up, startup Cloudbreak via cbd start and then create your clusters on the cloud.
... View more
Labels:
04-24-2017
07:47 PM
6 Kudos
Objective This tutorial will walk you through the process of using the PyHive Python module from Dropbox to query HiveServer2. You can read more about PyHive here: PyHive Prerequisites
You should already have Python 2.7 installed. You should already have a version of the Hortonworks Sandbox 2.5 setup. Scope This tutorial was tested using the following environment and components:
Mac OS X 10.12.3 Anaconda 4.3.1 (Python 2.7.13) Hortonworks HDP Sandbox 2.5 PyHive 0.1.5 Steps Install PyHive and Dependancies Before we can query Hive using Python, we have to install the PyHive module and associated dependancies. Because I'm using Anaconda, I chose to use the conda command to install PyHive. Because the PyHive module is provided by a third party, Blaze, you must specify -c blaze with the command line. You can read more about Blaze PyHive for Anaconda here: Blaze PyHive We need to instal PyHive using the following command:
$ conda install -c blaze pyhive You will be doing this installation on your local computer. You should see something similar to the following:
$ conda install -c blaze pyhive
Fetching package metadata ...........
Solving package specifications: .
Package plan for installation in environment /Users/myoung/anaconda:
The following NEW packages will be INSTALLED:
pyhive: 0.1.5-py27_0 blaze
sasl: 0.1.3-py27_0 blaze
thrift: 0.9.2-py27_0 blaze
Proceed ([y]/n)? y
thrift-0.9.2-p 100% |#####################################################################################################################################| Time: 0:00:00 3.07 MB/s
sasl-0.1.3-py2 100% |#####################################################################################################################################| Time: 0:00:00 15.18 MB/s
pyhive-0.1.5-p 100% |#####################################################################################################################################| Time: 0:00:00 10.92 MB/s As you can see, PyHive is dependant on the SASL and Thrift modules. Both of these modules were installed. Create Python Script Now that our local computer has the PyHive module installed, we can create a very simple Python script which will query Hive. Edit a file called pyhive-test.py . You can do this anywhere you like, but I prefer to create a directory under ~/Development for this.
$ mkdir ~/Development/pyhive
cd ~/Development/pyhive Now copy and paste the following test into your file. You can use any text editor you like. I usually use Microsoft Visual Studio Code or Atom.
from pyhive import hive
cursor = hive.connect('sandbox.hortonworks.com').cursor()
cursor.execute('SELECT * FROM sample_07 LIMIT 50')
print cursor.fetchall()
The sample07 database is already on the Sandbox, so this query should work without any problems. Start Hortonworks HDP Sandbox Before we can run our Python script, we have to make sure the Sandbox is started. Go ahead and do that now. Run Python Script Now that the Sandbox is runnig, we can run our script to execute the query.
$ python pyhive-test.py You should see something similar to the following:
$ python pyhive-test.py
[[u'00-0000', u'All Occupations', 134354250, 40690], [u'11-0000', u'Management occupations', 6003930, 96150], [u'11-1011', u'Chief executives', 299160, 151370], [u'11-1021', u'General and operations managers', 1655410, 103780], [u'11-1031', u'Legislators', 61110, 33880], [u'11-2011', u'Advertising and promotions managers', 36300, 91100], [u'11-2021', u'Marketing managers', 165240, 113400], [u'11-2022', u'Sales managers', 322170, 106790], [u'11-2031', u'Public relations managers', 47210, 97170], [u'11-3011', u'Administrative services managers', 239360, 76370], [u'11-3021', u'Computer and information systems managers', 264990, 113880], [u'11-3031', u'Financial managers', 484390, 106200], [u'11-3041', u'Compensation and benefits managers', 41780, 88400], [u'11-3042', u'Training and development managers', 28170, 90300], [u'11-3049', u'Human resources managers, all other', 58100, 99810], [u'11-3051', u'Industrial production managers', 152870, 87550], [u'11-3061', u'Purchasing managers', 65600, 90430], [u'11-3071', u'Transportation, storage, and distribution managers', 92790, 81980], [u'11-9011', u'Farm, ranch, and other agricultural managers', 3480, 61030], [u'11-9012', u'Farmers and ranchers', 340, 42480], [u'11-9021', u'Construction managers', 216120, 85830], [u'11-9031', u'Education administrators, preschool and child care center/program', 47980, 44430], [u'11-9032', u'Education administrators, elementary and secondary school', 218820, 82120], [u'11-9033', u'Education administrators, postsecondary', 101160, 85870], [u'11-9039', u'Education administrators, all other', 28640, 74230], [u'11-9041', u'Engineering managers', 184410, 115610], [u'11-9051', u'Food service managers', 191460, 48660], [u'11-9061', u'Funeral directors', 24020, 57660], [u'11-9071', u'Gaming managers', 3740, 69600], [u'11-9081', u'Lodging managers', 31890, 51140], [u'11-9111', u'Medical and health services managers', 242640, 84980], [u'11-9121', u'Natural sciences managers', 39370, 113170], [u'11-9131', u'Postmasters and mail superintendents', 26500, 57850], [u'11-9141', u'Property, real estate, and community association managers', 159660, 53530], [u'11-9151', u'Social and community service managers', 112330, 59070], [u'11-9199', u'Managers, all other', 356690, 91990], [u'13-0000', u'Business and financial operations occupations', 6015500, 62410], [u'13-1011', u'Agents and business managers of artists, performers, and athletes', 11680, 82730], [u'13-1021', u'Purchasing agents and buyers, farm products', 12930, 53980], [u'13-1022', u'Wholesale and retail buyers, except farm products', 132550, 53580], [u'13-1023', u'Purchasing agents, except wholesale, retail, and farm products', 281950, 56060], [u'13-1031', u'Claims adjusters, examiners, and investigators', 279400, 55470], [u'13-1032', u'Insurance appraisers, auto damage', 12150, 52020], [u'13-1041', u'Compliance officers, except agriculture, construction, health and safety, and transportation', 231910, 52740], [u'13-1051', u'Cost estimators', 219070, 58640], [u'13-1061', u'Emergency management specialists', 11610, 51470], [u'13-1071', u'Employment, recruitment, and placement specialists', 193620, 52710], [u'13-1072', u'Compensation, benefits, and job analysis specialists', 109870, 55740], [u'13-1073', u'Training and development specialists', 202820, 53040], [u'13-1079', u'Human resources, training, and labor relations specialists, all other', 211770, 56740]] Review As you can see, using Python to query Hive is fairly straight forward. We were able to install the required Python modules in a single command, create a quick Python script and run the script to get 50 records from the sample07 database in Hive.
... View more
Labels:
03-29-2017
11:51 PM
@Girish Mane The JournalNodes are for shared edits. They are responsible for keep in the Active and Standby NameNodes in sync in terms of filesystem edits. You do not need a JournalNode for each of your data nodes. The normal approach is to use 3 JournalNodes to give the greatest level of high availability. It's the same idea behind 3x replication of data.
... View more
03-05-2017
06:58 PM
4 Kudos
Objective
This tutorial will walk you through the process of using Ansible to deploy Hortonworks Data Platform (HDP) on Amazon Web Services (AWS). We will use the ansible-hadoop Ansible playbook from ObjectRocket to do this. You can find more information on that playbook here: ObjectRocket Ansible-Hadoop
This tutorial is part 2 of a 2 part series. Part 1 in the series will show you how to use Ansible to create instances on Amazon Web Services (AWS). Part 1 is avaiablle here: HCC Article Part 1
This tutorial was created as a companion to the Ansible + Hadoop talk I gave at the Ansible NOVA Meetup in February 2017. You can find the slides to that talk here: SlideShare
Prerequisites
You must have an existing AWS account.
You must have access to your AWS Access and Secret keys.
You are responsible for all AWS costs incurred.
You should have 3-6 instances created in AWS. If you completed Part 1 of this series, then you have an easy way to do that.
Scope
This tutorial was tested using the following environment and components:
Mac OS X 10.11.6 and 10.12.3
Amazon Web Services
Anaconda 4.1.6 (Python 2.7.12)
Ansible 2.1.3.0
git 2.10.1
Steps
Create python virtual environment
We are going to create a Python virtual environment for installing the required Python modules. This will help eliminate module version conflicts between applications.
I prefer to use Continuum Anaconda for my Python distribution. Therefore the steps for setting up a python virtual environment will be based on that. However, you can use standard python and the virtualenv command to do something similar.
To create a virtual environment using Anaconda Python, you use the conda create command. We will name our virtual environment ansible-hadoop . The the following command: conda create --name ansible-hadoop python will create our virtual environment with the name specified. You should see something similar to the following:
$ conda create --name ansible-hadoop python
Fetching package metadata .......
Solving package specifications: ..........
Package plan for installation in environment /Users/myoung/anaconda/envs/ansible-hadoop:
The following NEW packages will be INSTALLED:
openssl: 1.0.2k-1
pip: 9.0.1-py27_1
python: 2.7.13-0
readline: 6.2-2
setuptools: 27.2.0-py27_0
sqlite: 3.13.0-0
tk: 8.5.18-0
wheel: 0.29.0-py27_0
zlib: 1.2.8-3
Proceed ([y]/n)? y
Linking packages ...
cp: /Users/myoung/anaconda/envs/ansible-hadoop:/lib/libcrypto.1.0.0.dylib: No such file or directory
mv: /Users/myoung/anaconda/envs/ansible-hadoop/lib/libcrypto.1.0.0.dylib-tmp: No such file or directory
[ COMPLETE ]|################################################################################################| 100%
#
# To activate this environment, use:
# $ source activate ansible-hadoop
#
# To deactivate this environment, use:
# $ source deactivate
#
Switch python environments
Before installing python packages for a specific development environment, you should activate the environment. This is done with the command source activate <environment> . In our case the environment is the one we just created, ansible-hadoop . You should see something similar to the following:
$ source activate ansible-hadoop
As you can see there is no output to indicate if we were successful in changing our environment.
To verify, you can use the conda info --envs command list the available environments. The active environment will have a * . You should see something similar to the following:
$ conda info --envs
# conda environments:
#
ansible-hadoop * /Users/myoung/anaconda/envs/ansible-hadoop
root /Users/myoung/anaconda
As you can see, the ansible-hadoop environment has the * which means it is the active environment.
If you want to remove your python virtual environment, you can use the following command: conda remove --name <environment> --all . If you want to remove the environment we just created you should see something similar to the following:
$ conda remove --name ansible-hadoop --all
Package plan for package removal in environment /Users/myoung/anaconda/envs/ansible-hadoop:
The following packages will be REMOVED:
openssl: 1.0.2k-1
pip: 9.0.1-py27_1
python: 2.7.13-0
readline: 6.2-2
setuptools: 27.2.0-py27_0
sqlite: 3.13.0-0
tk: 8.5.18-0
wheel: 0.29.0-py27_0
zlib: 1.2.8-3
Proceed ([y]/n)? y
Unlinking packages ...
[ COMPLETE ]|################################################################################################| 100%
HW11380:test myoung$ conda info --envs
# conda environments:
#
root * /Users/myoung/anaconda
Install Python modules in virtual environment
The ansible-hadoop playbook requires a specific version of Ansible. You need to install Ansible 2.1.3.0 before using the playbook. You can do that easily with the following command:
pip install ansible==2.1.3.0
Using a Python Virtual environment allows us to easily use Ansbile 2.1.3.0 for our playbook without impacting the default Python versions.
Clone ansible-hadoop github repo
You need to clone the ansible-hadoop github repo to a working directory on your computer. I typically do this in ~/Development.
$ cd ~/Development
$ git clone https://github.com/objectrocket/ansible-hadoop.git
You should see something similar to the following:
$ git clone https://github.com/objectrocket/ansible-hadoop.git
Cloning into 'ansible-hadoop'...
remote: Counting objects: 3879, done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 3879 (delta 1), reused 0 (delta 0), pack-reused 3873
Receiving objects: 100% (3879/3879), 6.90 MiB | 0 bytes/s, done.
Resolving deltas: 100% (2416/2416), done.
Configure ansible-hadoop
You should make the ansible-hadoop repo directory your current working directory. There are a few configuration items we need to change.
$ cd ansible-hadoop
You should already have 3-6 instances available in AWS. You will need the public IP address of those instances.
Configure ansible-hadoop/inventory/static
We need to modify the inventory/static file to include the public IP addresses of our AWS instances. We need to assign master and slave nodes in the file. The instances are all the same configuration by default, so it doesn't matter which IP addresses you put for master and slave.
The default version of the inventory/static file should look similar to the following:
[master-nodes]
master01 ansible_host=192.168.0.2 bond_ip=172.16.0.2 ansible_user=rack ansible_ssh_pass=changeme
#master02 ansible_host=192.168.0.2 bond_ip=172.16.0.2 ansible_user=root ansible_ssh_pass=changeme
[slave-nodes]
slave01 ansible_host=192.168.0.3 bond_ip=172.16.0.3 ansible_user=rack ansible_ssh_pass=changeme
slave02 ansible_host=192.168.0.4 bond_ip=172.16.0.4 ansible_user=rack ansible_ssh_pass=changeme
[edge-nodes]
#edge01 ansible_host=192.168.0.5 bond_ip=172.16.0.5 ansible_user=rack ansible_ssh_pass=changeme
I'm going to be using 6 instances in AWS. I will put 3 instances as master servers and 3 instances as slave servers. There are a couple of extra options in the default file we don't need. The only values we need are:
hostname : which should be master, slave or edge with a 1-up number like master01 and slave01
ansible_host : should be the AWS public IP address of the instances
ansible_user : should be the username you SSH into the instance using the private key.
You can easily get the public IP address of your instances from the AWS console. Here is what mine looks like:
If you followed the part 1 tutorial, then the username for your instances should be centos . Edit your inventory/static . You should have something similar to the following:
[master-nodes]
master01 ansible_host=#.#.#.# ansible_user=centos
master03 ansible_host=#.#.#.# ansible_user=centos
master03 ansible_host=5#.#.#.# ansible_user=centos
[slave-nodes]
slave01 ansible_host=#.#.#.# ansible_user=centos
slave02 ansible_host=#.#.#.# ansible_user=centos
slave03 ansible_host=#.#.#.# ansible_user=centos
#[edge-nodes]
Your public IP addresses will be different. Also note the #[edge-nodes] value in the file. Because we are not using any edge nodes, we should comment that host group line in the file.
Once you have all of your edits in place, save the file.
Configure ansible-hadoop/ansible.cfg
There are a couple of changes we need to make to the ansible.cfg file. This file provides overall configuration settings for Ansible. The default file in the playbook should look similar to the following:
[defaults]
host_key_checking = False
timeout = 60
ansible_keep_remote_files = True
library = playbooks/library/cloudera
We need to change the library line to be library = playbooks/library/site_facts . We will be deploying HDP which requires the site_facts module. We also need to tell Ansible where to find the private key file for connecting to the instances.
Edit the ansible.cfg file. You should modify the file to be similar to the following:
[defaults]
host_key_checking = False
timeout = 60
ansible_keep_remote_files = True
library = playbooks/library/site_facts
private_key_file=/Users/myoung/Development/ansible-hadoop/ansible.pem
Note the path of your private_key_file will be different. Once you have all of your edits in place, save the file.
Configure ansible-hadoop/group_vars/hortonworks
This step is optional. The group_vars/hortonworks file allows you to change how HDP is deployed. You can modify the version of HDP and Ambari. You can modify which components are installed. You can also specify custom repos and Ambari blueprints.
I will be using the default file, so there are no changes made.
Run bootstrap_static.sh
Before installing HDP, we need to ensure our OS configuration on the AWS instances meet the installation prerequisites. This includes things like ensuring DNS and NTP are working and all of the OS packages are updated. These are tasks that you often find people doing manually. This would obviously be tedious across 100s or 1000s of nodes. It would also introduce a far greater number of opportunties for human error. Ansible makes it incredibly easy to perform these kinds of tasks.
Running the bootstrap process is as easy as bash bootstrap_static.sh . This script essentially runs ansible-playbook -i inventory/static playbooks/boostrap.yml for you. This process will typically take 7-10 minutes depending on the size of the instances you selected.
When the script is finished, you should see something similar to the following;
PLAY RECAP *********************************************************************
localhost : ok=3 changed=2 unreachable=0 failed=0
master01 : ok=21 changed=15 unreachable=0 failed=0
master03 : ok=21 changed=15 unreachable=0 failed=0
slave01 : ok=21 changed=15 unreachable=0 failed=0
slave02 : ok=21 changed=15 unreachable=0 failed=0
slave03 : ok=21 changed=15 unreachable=0 failed=0
As you can see, all of the nodes had 21 total task performed. Of those tasks, 15 tasks required modifications to be compliant with the desire configuration state.
Run hortonworks_static.sh
Now that the bootstrap process is complete, we can install HDP. The hortonworks_static.sh script is all you have to run to install HDP. This script essentially runs ansible-playbook -i inventory/static playbooks/hortonworks.yml for you. The script installs the Ambari Server on the last master node in our list. In my case, the last master node is master03. The script also installs the Ambari Agent on all of the nodes. The installation of HDP is performed by submitting an request to the Ambari Server API using an Ambari Blueprint.
This process will typically take 10-15 minutes depending on the size of the instances you selected, the number of master nodes and the list of HDP components you have enabled.
If you forgot to install the specific version of Ansible, you will likely see something similar to the following:
TASK [site facts processing] ***************************************************
fatal: [localhost]: FAILED! => {"failed": true, "msg": "ERROR! The module sitefacts.py dnmemory=\"31.0126953125\" mnmemory=\"31.0126953125\" cores=\"8\" was not found in configured module paths. Additionally, core modules are missing. If this is a checkout, run 'git submodule update --init --recursive' to correct this problem."}
PLAY RECAP *********************************************************************
localhost : ok=4 changed=2 unreachable=0 failed=1
master01 : ok=8 changed=0 unreachable=0 failed=0
master03 : ok=8 changed=0 unreachable=0 failed=0
slave01 : ok=8 changed=0 unreachable=0 failed=0
slave02 : ok=8 changed=0 unreachable=0 failed=0
slave03 : ok=8 changed=0 unreachable=0 failed=0
To resolve this, simply perform the pip install ansible==2.1.3.0 command within your Python virtual environment. Now you can rerun the bash hortonworks_static.sh script.
The last task of the playbook is to install HDP via an Ambari Blueprint. It is normal to see something similar to the following:
TASK [ambari-server : Create the cluster instance] *****************************
ok: [master03]
TASK [ambari-server : Wait for the cluster to be built] ************************
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (180 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (179 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (178 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (177 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (176 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (175 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (174 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (173 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (172 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (171 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (170 retries left).
Once you see 3-5 of the retry messages, you can access the Ambari interface via your web browser. The default login is admin and the default password is admin . You should see something similar to the following:
Click on the Operations icon that shows 10 operations in progress. You should see something similar to the following:
The installation task each takes between 400-600 seconds. The start task each take between 20-300 seconds. The master servers typically take longer to install and star than the slave servers.
When everything is running properly, you should see something similar to this:
If you look back at your terminal window, you should see something similar to the following:
ok: [master03]
TASK [ambari-server : Fail if the cluster create task is in an error state] ****
skipping: [master03]
TASK [ambari-server : Change Ambari admin user password] ***********************
skipping: [master03]
TASK [Cleanup the temporary files] *********************************************
changed: [master03] => (item=/tmp/cluster_blueprint)
changed: [master03] => (item=/tmp/cluster_template)
changed: [master03] => (item=/tmp/alert_targets)
ok: [master03] => (item=/tmp/hdprepo)
PLAY RECAP *********************************************************************
localhost : ok=5 changed=3 unreachable=0 failed=0
master01 : ok=8 changed=0 unreachable=0 failed=0
master03 : ok=30 changed=8 unreachable=0 failed=0
slave01 : ok=8 changed=0 unreachable=0 failed=0
slave02 : ok=8 changed=0 unreachable=0 failed=0
slave03 : ok=8 changed=0 unreachable=0 failed=0
Destroy the cluster
You should remember that you will incur AWS costs while the cluster is running. You can either shutdown or terminate the instances. If you want to use the cluster later, then use Ambari to stop all of the services before shutting down the instances.
Review
If you successfully followed along with this tutorial, you should have been able to easy deploy Hortonworks Data Platform 2.5 on AWS using the Ansible playbook. The process to deploy the cluster typicall takes 10-20 minutes.
For more information on how the instance types and number of master nodes impacted the installation time, review the Ansbile + Hadoop slides I linked at the top of the article.
... View more
03-04-2017
06:05 PM
4 Kudos
Objective
This tutorial will walk you through the process of using Ansible, an agent-less automation tool, to create instances on AWS. The Ansible playbook we will use is relatively simple; you can use it as a base to experiment with more advanced features. You can read more about Ansible here: Ansible.
Ansible is written in Python and is installed as a Python module on the control host. The only requirement for the hosts managed by Ansible is the ability to login with SSH. There is no requirement to install any software on the host managed by Ansible.
If you have never used Ansible, you can become more familiar with it by going through some basic tutorials. The following two tutorials are a good starting point:
Automate All Things With Ansible: Part One
Automate All Things With Ansible: Part Two
This tutorial is part 1 of a 2 part series. Part 2 in the series will show you how to use Ansible to deploy Hortonworks Data Platform (HDP) on Amazon Web Services (AWS).
This tutorial was created as a companion to the Ansible + Hadoop talk I gave at the Ansible NOVA Meetup in February 2017. You can find the slides to that talk here: SlideShare
You can get a copy of the playbook from this tutorial here: Github
Prerequisites
You must have an existing AWS account.
You must have access to your AWS Access and Secret keys.
You are responsible for all AWS costs incurred.
Scope
This tutorial was tested using the following environment and components:
Mac OS X 10.11.6 and 10.12.3
Amazon Web Services
Anaconda 4.1.6 (Python 2.7.12)
Ansible 2.0.0.2 and 2.1.3.0
Steps
Create a project directory
You need to create a directory for your Ansible playbook. I prefer to create my project directories in ~/Development.
mkdir ~/Development/ansible-aws
cd ~/Development/ansible-aws
Install Ansible module
If you use the Anaconda version of Python, you already have access to Ansible. If you are not using Anaconda, then you can usually install Ansible using the following command:
pip install ansible
To read more about how to install Ansible: Ansible Installation
Overview of our Ansible playbook
Our playbook is relatively simple. It consists of a single inventory file, single group_vars file and a single playbook file. Here is the layout of the file and directory structure:
+- ansible-aws/
|
+- group_vars/
| +- all
|
+- inventory/
| +- hosts
|
+- playbooks/
| +- ansible-aws.yml
group_vars/all
You can use variables in your playbooks using the {{variable name}} syntax. These variables are populated based on values stored in your variable files. You can explicitly load variable files in your playbooks.
However, all playbooks will automatically load the variables in the group_vars/all variable file. The all variable file is loaded for all hosts regardless of the groups the host may be in. In our playbook, we are placing our AWS configuration values in the all file.
Edit the group_vars/all file. Copy and paste the following text into the file:
aws_access_key: <enter AWS access key>
aws_secret_key: <enter AWS secret key>
key_name: <enter private key file alias name>
aws_region: <enter AWS region>
vpc_id: <enter VPC ID>
ami_id: ami-6d1c2007
instance_type: m4.2xlarge
my_local_cidr_ip: <enter cidr_ip>
aws_access_key : You need to enter your AWS Access key
aws_secret_key : You need to enter your AWS Secret key
key_name : The alias name you gave to the AWS private key which you will use to SSH into the instances. In my case I created a key called ansible .
aws_region : The AWS region where you want to deploy your instances. In my case I am using us-east-1 .
vpc_id : The specific VPC in which you want to place your instances.
ami_id : The specific AMI you want to deploy for your instances. The ami-6d1c2007 AMI is a CentOS 7 image.
instance_type : The type of AWS instance. For deploying Hadoop, I recommend at least m4.2xlarge . A faster alternative is c4.4xlarge .
my_local_cidr_ip : Your local computer's CIDR IP address. This is used for creating the security rules that allow your local computer to access the instances. An example CIDR format is 192.168.1.1/32 . Make sure this set to your computer's public IP address.
After you have entered your appropriate settings, save the file.
inventory/hosts
Ansible requires a list of known hosts against which playbooks and tasks are run. We will tell Ansible to use a specific host file with the -i inventory/hosts parameter.
Edit the inventory/hosts file. Copy and paste the following text into the file:
[local]
localhost ansible_python_interpreter=/Users/myoung/anaconda/bin/python
[local] : Defines the group the host belongs to. You have the option for a playbook to run against all hosts, a specific group of hosts, or an individual host. This AWS playbook only runs on your local computer. That is because it uses the AWS APIs to communicate with AWS.
localhost : This is the hostname. You can list multiple hosts, 1 per line under each group heading. A host can belong to multiple groups.
ansible_python_interpreter : Optional entry that tells Ansible which specific version of Python to run. Because I am using Anaconda Python, I've included that setting here.
After you have entered your appropriate settings, save the file.
playbooks/ansible-aws.yml
The playbook is where we define the list of tasks we want to perform. Our playbook will consist of 2 tasks. The first task is to create a specific AWS Security Group. The second tasks is to create a specific configuration of 6 instances on AWS.
Edit the file playbooks/ansible-aws.yml . Copy and paste the following text into the file:
---
# Basic provisioning example
- name: Create AWS resources
hosts: localhost
connection: local
gather_facts: False
tasks:
- name: Create a security group
ec2_group:
name: ansible
description: "Ansible Security Group"
region: "{{aws_region}}"
vpc_id: "{{vpc_id}}""
aws_access_key: "{{aws_access_key}}"
aws_secret_key: "{{aws_secret_key}}"
rules:
- proto: all
cidr_ip: "{{my_local_cidr_ip}}"
- proto: all
group_name: ansible
rules_egress:
- proto: all
cidr_ip: 0.0.0.0/0
register: firewall
- name: Create an EC2 instance
ec2:
aws_access_key: "{{aws_access_key}}"
aws_secret_key: "{{aws_secret_key}}"
key_name: "{{key_name}}"
region: "{{aws_region}}"
group_id: "{{firewall.group_id}}"
instance_type: "{{instance_type}}"
image: "{{ami_id}}"
wait: yes
volumes:
- device_name: /dev/sda1
volume_type: gp2
volume_size: 100
delete_on_termination: true
exact_count: 6
count_tag:
Name: aws-demo
instance_tags:
Name: aws-demo
register: ec2
This playbook uses the Ansible ec2 and ec2_group modules. You can read more about the options available to those modules here:
ec2
ec2_group
The task to create the EC2 security group creates a group named ansible . It defines 2 ingress rules and 1 egress rule for that security group. The first ingress rule is to allow all inbound traffic from any host in the security group ansible . The second ingress rule is to allow all inbound traffic from your local computer IP address. The egress rule allows all traffic out from all of the hosts.
The task to create the EC2 instances creates 6 hosts because of the exact_count setting. It creates a tag called hadoop-demo on each of the instances and uses that tag to determine how many hosts exists. You can chose to use smaller number of hosts.
You can specify volumes to mount on each of the instances. The default volume size is 8 GB and is too small for deploying Hadoop later. I recommend setting the size to at least 100 GB as above. I also recommend you set delete_on_termination to true . This will tell AWS to delete the storage after you have deleted the instances. If you do not do this, then storage will be kept and you will be charged for it.
After you have entered your appropriate settings, save the file.
Running the Ansible playbook
Now that our 3 files have been created and saved with the appropriate settings, we can run the playbook. To run the playbook, you use the ansible-playbook -i inventory/hosts playbooks/ansible-aws.yml command. You should see something similar to the following:
$ ansible-playbook -i inventory/hosts playbooks/ansible-aws.yml
PLAY [Create AWS resources] ****************************************************
TASK [Create a security group] *************************************************
changed: [localhost]
TASK [Create an EC2 instance] **************************************************
changed: [localhost]
PLAY RECAP *********************************************************************
localhost : ok=2 changed=2 unreachable=0 failed=0
The changed lines indicate that Ansible found a configuration that needed to be modify to be consistent with our requested state. For the security group task, you would see this if your security group didn't exist or if you had a different set of ingress or egress rules. For the instance tasks, you would see this if there were less than or more than 6 hosts tagged as aws-demo .
Check AWS console.
If you check your AWS console, you should be able to confirm the instances are created. You should see something similar to the following:
Review
If you successfully followed along with this tutorial, you have created a simple Ansible playbook with 2 tasks using the ec2 and ec2_group Ansible modules. The playbook creates an AWS security group and instances which can be used later for deploying HDP on AWS.
... View more
02-28-2017
04:40 AM
3 Kudos
Objective
This tutorial is designed to walk you through the process of creating a MiniFi flow to read data from a Sense HAT sensor on a Raspberry Pi 3. The MiniFi flow will push data to a remote NiFi instance running on your computer. The NiFi instance will push the data to Solr.
While there are other tutorials and examples of using NiFi/MiniFi with a Raspberry Pi, most of those tutorials tend to use a more complicated sensor implementation. The Sense HAT is
very easy to install and use. Prerequisites
You should have a Raspberry Pi 3 Model B: Raspberry Pi 3 Model B I recommend a 16+GB SD card for your Raspberry Pi 3. Don't forget to expand the filesystem after the OS is installed: raspi-config
You should have a Sense HAT: Sense HAT You should already have installed the Sense HAT on your Raspberry Pi 3.
You should already have installed Raspbian Jessie Lite on your Raspberry Pi 3 SD card: Raspbian Jessie Lite The instructions for installing a Raspberry Pi OS can be found here: Raspberry PI OS Install You may be able to use the NOOBS operating system that typically ships with the Raspbery Pi. However, the Raspbian Lite OS will ensure the most system resources available to MiniFi or NiFi.
You should have enabled SSH on your Raspberry Pi: Enable SSH
You should have enabled WiFi on your Raspberry Pi (or use wired networking): Setup WiFi
You should have NiFi 1.x installed and working on your computer: NiFi
You should have the Java MiniFi Toolkit 0.1.0 installed and working on your computer: MiniFi ToolKit
You should have downloaded Solr 6.x on your computer: Solr Download Scope
This tutorial was tested using the following environment and components:
Mac OS X 10.11.6 and 10.12.3
MiniFi 1.0.2.1.1.0-2.1
MiniFi Toolkit 0.1.0
NiFi 1.1.1
Solr 6.4.1
Java JDK 1.8 Steps Connect to Raspberry Pi using SSH
If you have completed all of the prerequisites, then you should be able to easily SSH into your Raspberry Pi. On my Mac, I connect using:
ssh pi@raspberrypi
The default username is
pi and the password is raspberry .
If you get an unknown host or DNS error, then you need to specify the IP address of the Raspberry Pi. You can get that by logging directly into the Raspberry Pi console.
Now run the
ifconfig command.
You should see something similar to the following:
pi@raspberrypi:~ $ ifconfig
eth0 Link encap:Ethernet HWaddr b8:27:eb:60:ff:5b
inet6 addr: fe80::ec95:e79b:3679:5159/64 Scope:Link
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
wlan0 Link encap:Ethernet HWaddr b8:27:eb:35:aa:0e
inet addr:192.168.1.204 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::21f6:bf0f:5f9f:d60d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:17280 errors:0 dropped:11506 overruns:0 frame:0
TX packets:872 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3414755 (3.2 MiB) TX bytes:133472 (130.3 KiB)
If you are using WiFi, then look at the
wlan0 device. If you are using wired ethernet, then look at the eth0 device. Now you can connect using the ip address you found.
ssh pi@192.168.1.204 .
Your IP address will vary. Update Raspberry Pi packages
It's always a good idea to ensure your installed packages are up to date. Raspbian Lite is based on Debian. Therefore you need use
apt-get to update and install packages.
First, we need to run
sudo apt-get update to update the list of available packages and versions. You should see something similar to the following:
pi@raspberrypi:~ $ sudo apt-get update
Get:1 http://mirrordirector.raspbian.org jessie InRelease [14.9 kB]
Get:2 http://archive.raspberrypi.org jessie InRelease [22.9 kB]
Get:3 http://mirrordirector.raspbian.org jessie/main armhf Packages [8,981 kB]
Get:4 http://archive.raspberrypi.org jessie/main armhf Packages [145 kB]
Get:5 http://archive.raspberrypi.org jessie/ui armhf Packages [57.6 kB]
Get:6 http://mirrordirector.raspbian.org jessie/contrib armhf Packages [37.5 kB]
Get:7 http://mirrordirector.raspbian.org jessie/non-free armhf Packages [70.3 kB]
Get:8 http://mirrordirector.raspbian.org jessie/rpi armhf Packages [1,356 B]
Ign http://archive.raspberrypi.org jessie/main Translation-en_US
Ign http://archive.raspberrypi.org jessie/main Translation-en
Ign http://archive.raspberrypi.org jessie/ui Translation-en_US
Ign http://archive.raspberrypi.org jessie/ui Translation-en
Ign http://mirrordirector.raspbian.org jessie/contrib Translation-en_US
Ign http://mirrordirector.raspbian.org jessie/contrib Translation-en
Ign http://mirrordirector.raspbian.org jessie/main Translation-en_US
Ign http://mirrordirector.raspbian.org jessie/main Translation-en
Ign http://mirrordirector.raspbian.org jessie/non-free Translation-en_US
Ign http://mirrordirector.raspbian.org jessie/non-free Translation-en
Ign http://mirrordirector.raspbian.org jessie/rpi Translation-en_US
Ign http://mirrordirector.raspbian.org jessie/rpi Translation-en
Fetched 9,330 kB in 17s (542 kB/s)
Reading package lists... Done
Now we can update our installed packages using
sudo apt-get dist-upgrade . You should see something similar to the following:
pi@raspberrypi:~ $ sudo apt-get dist-upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
bind9-host libbind9-90 libdns-export100 libdns100 libevent-2.0-5 libirs-export91 libisc-export95 libisc95 libisccc90
libisccfg-export90 libisccfg90 libjasper1 liblwres90 libpam-modules libpam-modules-bin libpam-runtime libpam0g login
passwd pi-bluetooth raspberrypi-sys-mods raspi-config vim-common vim-tiny
24 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 4,767 kB of archives.
After this operation, 723 kB disk space will be freed.
Do you want to continue? [Y/n] y
The list of packages and versions that need to be updated will vary. Enter
y to update the installed packages. Install additional Raspberry Pi packages
We need to install additional packages to interact with the Sense HAT sensor and run MiniFi.
You access the Sense HAT libraries using Python. Therefore the first package we need to install is Python.
sudo apt-get install python
The second package we need to install is the libraries for the Sense HAT device.
sudo apt-get install sense-hat
We will be using the Java version of MiniFi. Therefore the third package we need to install is the Oracle JDK 8.
sudo apt-get install oracle-java8-jdk Verify Sense HAT functionality
Before we use MiniFi to collect any data, we need to ensure we can interact with the Sense HAT sensor. We will create a simple Python script to display a message on our Sense HAT.
Edit the file display_message.py using
vi display_message.py . Now copy and paste the following text into your text editor (remember to go into insert mode first):
from sense_hat import SenseHat
sense = SenseHat()
sense.show_message("Hello")
Save the script using
:wq! . Run this script using python display_message.py . You should see the word Hello scroll across the display of the Sense HAT in white text.
Now let's test reading the temperature from the Sense Hat. Edit the file get_temp.py using
vi get_temp.py . Now copy and paste the following text into your text editor (remember to go into insert mode first):
from sense_hat import SenseHat
sense = SenseHat()
t = sense.get_temperature()
print('Temperature = {0:0.2f} C'.format(t))
Save the script using
:wq! . Run the script using python get_temp.py . You should something similar to the following (your values will vary):
pi@raspberrypi:~ $ python get_temp.py
Temperature = 31.58 C
For our MiniFi use case, we will be looking at temperature, pressure, and humidity data. We will not use the Sense HAT display for MiniFi, so we'll only print the data to the console.
You can read more about the Sense HAT functions here:
Sense HAT API
Now let's create a script which prints all 3 sensor values. Edit the file get_environment.py using
vi get_environment.py . Copy and paste the following text into your text editor (remember to go into insert mode first):
from sense_hat import SenseHat
import datetime
sense = SenseHat()
t = sense.get_temperature()
p = sense.get_pressure()
h = sense.get_humidity()
print('Hostname = rapsberrypi')
print('DateTime = ' + datetime.datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ"))
print('Temperature = {0:0.2f} C'.format(t))
print('Pressure = {0:0.2f} Millibars'.format(p))
print('Humidity = {0:0.2f} %rH'.format(h))
Save the script using
:wq! . Run the script using python get_environment.py . You should something similar to the following (your values will vary):
Hostname = rapsberrypi
DateTime = 2017-02-27T21:20:55Z
Temperature = 32.90 C
Pressure = 1026.53 Millibars
Humidity = 25.36 %rH
As you can see from the script, we are printing our date output using UTC time via the
utcnow() function. We also need to ensure the data format is consumable by Solr. That is why we are using %Y-%m-%dT%H:%M:%SZ which is a format Solr can parse.
Our MiniFi flow will use the
ExecuteProcess to run the script. So we need to create a simple bash script to run the get_environment.py file. Edit the file get_environment.sh using vi get_environment.sh . Copy and paste the following text into your text editor (remember to go into insert mode first):
python /home/pi/get_environment.py
Save the script using
:wq! . Make sure the script is executable by running chmod 755 get_environment.sh . Let's make sure the bash script works ok. Run the script using ./get_environment.sh . You should something similar to the following (your values will vary):
Hostname = rapsberrypi
DateTime = 2017-02-27T21:20:55Z
Temperature = 32.90 C
Pressure = 1026.53 Millibars
Humidity = 25.36 %rH
Install MiniFi
We are going to install MiniFi on the Raspberry Pi. First download the the MiniFi release.
wget http://public-repo-1.hortonworks.com/HDF/2.1.1.0/minifi-1.0.2.1.1.0-2-bin.tar.gz Now you can extract it using tar xvfz minifi-1.0.2.1.1.0-2-bin.tar.gz .
Now we are ready to create our NiFi and MiniFi flows. Start NiFi
On your computer (not on the Raspberry Pi), start NiFi if you have not already done so. You do this by running
<nifi installation dir>/bin/nifi.sh start . It may take a few minutes before NiFi is fully started. You can monitor the logs by running tail -f <nifi installation dir>/log/nifi.app.log .
You should see something similar to the following when the UI is ready:
2017-02-26 14:10:01,199 INFO [main] org.eclipse.jetty.server.Server Started @40057ms
2017-02-26 14:10:01,695 INFO [main] org.apache.nifi.web.server.JettyServer NiFi has started. The UI is available at the following URLs:
2017-02-26 14:10:01,695 INFO [main] org.apache.nifi.web.server.JettyServer http://127.0.0.1:9091/nifi
2017-02-26 14:10:01,695 INFO [main] org.apache.nifi.web.server.JettyServer http://192.168.1.186:9091/nifi
2017-02-26 14:10:01,697 INFO [main] org.apache.nifi.BootstrapListener Successfully initiated communication with Bootstrap
2017-02-26 14:10:01,697 INFO [main] org.apache.nifi.NiFi Controller initialization took 11161419754 nanoseconds.
Now you should be able to access NiFi in your browser by going to
<hostname>:8080/nifi . The default port is 8080 . If you have a port conflict, you can change the port.
You should see a blank NiFi canvas similar to the following:
NiFi Blank Canvas
Setup Solr
Before we start on our NiFi flow, let's make sure Solr is running. We are going to use schemaless mode. You can easily start Solr using
solr -e schemaless .
You should see something similar to the following:
$ bin/solr -e schemaless
Creating Solr home directory /Users/myoung/Downloads/solr-6.4.1/example/schemaless/solr
Starting up Solr on port 8983 using command:
bin/solr start -p 8983 -s "example/schemaless/solr"
Waiting up to 180 seconds to see Solr running on port 8983 [\]
Started Solr server on port 8983 (pid=49659). Happy searching!
Copying configuration to new core instance directory:
/Users/myoung/Downloads/solr-6.4.1/example/schemaless/solr/gettingstarted
Creating new core 'gettingstarted' using command:
http://localhost:8983/solr/admin/cores?action=CREATE&name=gettingstarted&instanceDir=gettingstarted
{
"responseHeader":{
"status":0,
"QTime":1371},
"core":"gettingstarted"}
Solr schemaless example launched successfully. Direct your Web browser to http://localhost:8983/solr to visit the Solr Admin UI
As you can see, Solr created a collection called
gettingstarted . That is the name of the collection our NiFi PutSolrContentStream will use. Create NiFi flow
Now we need to create our NiFi flow that will receive data from MiniFi. Input Port
The MiniFi flow will send data to a
Remote Process Group . The Remote Process Group requires an Input Port . From the NiFi menu, drag the Input Port icon to the canvas.
In the
Add Port dialog that is displayed, type a name for your port. I used From Raspberry Pi . You should see something similar to the following:
Click the blue
ADD button. ExtractText
From the NiFi menu, drag the
Processor icon to the canvas. In the Filter box, enter extract . You should see something similar to the following:
Select the
ExtractText processor. Click on the blue ADD button to add the processor to the canvas.
Now we need to configure the
ExtractText processor. Right click on the processor and select the Configure menu option.
On the
SETTINGS tab of the ExtractText processor, you should check the unmatched box under Automatically Terminate Relationships . This will drop any records which we fail to extract text from. You should see something similar to the following:
On the
PROPERTIES tab of the ExtractText processor, there are a few changes we need to make.
First, we want want to set
Enable Multiline Mode to true . This allows the Regular Expressions to match across multiple lines. This is important because our data is coming in as multiline data.
Second, we want to set
Include Capture Group 0 to false . Each Regular Expression we are using has only a single group. If we left this value to true, each field we extract would have duplicate values which would go unused as <attribute name>.0 .
Third, we need to add additional fields to the processor which allows us to define our Regular Expressions. If you click the
+ icon in the upper right corner of the dialog, you should see something similar to the following:
We are going to add a property called
hostname . This will hold the value from the line Hostname = in the data. Click the blue OK button. Now you should see another dialog where you enter the regular expression. You should see something similar to the following:
Enter the following Regular Expression:
Hostname = (\w+)
We need to repeat this process for each of the other data elements coming from the Raspberry Pi. You should have the following extra fields defined as separate fields:
property: hostname
value: Hostnamne = (\w+)
property: datetime
value: DateTime = (\d{4}\-\d{2}\-\d{2}T\d{2}\:\d{2}:\d{2}Z)
property: temperature
value: Temperature = (\d+\.\d+) C
property: humidity
value: Humidity = (\d+\.\d+) %rH
property: pressure
value: Pressure = (\d+\.\d+) Millibars
When you have entered each of these properties, you should see something similar to the following:
Click the blue
APPLY button to save the changes. AttributesToJSON
From the NiFi menu, drag the
Processor icon to the canvas. In the Filter box, enter attributes . You should see something similar to the following:
Select the
AttributesToJSON processor. Click on the blue ADD button to add the processor to the canvas.
Now we need to configure the
AttributesToJSON processor. Right click on the processor and select the Configure menu option.
On the
PROPERTIES tab of the AttributesToJSON processor, there are a few changes we need to make.
For the
Attributes List property, we need to provide a comma-separated list of attributes we want the processor to pass on. Click inside the Value box next to Attributes List . Enter the following value:
hostname,datetime,temperature,pressure,humidity
For the
Destination property, set the value to flowfile-content . We need the values to be in the flowfile content itself as JSON which is needed by the PutSolrContentStream processor. Otherwise the flowfile content will contain the raw data (not JSON) coming from the Raspberry Pi. This will cause Solr to throw errors because it is not able to parse request.
You should see something similar to the following:
Click the blue
APPLY button to save the changes. PutSolrContentStream
From the NiFi menu, drag the
Processor icon to the canvas. In the Filter box, enter solr . You should see something similar to the following:
Select the
PutSolrContentStream processor. Click on the blue ADD button to add the processor to the canvas.
Now we need to configure the
PutSolrContentStream processor. Right click on the processor and select the Configure menu option.
On the
SETTINGS tab of the PutSolrContentStream processor, you should check the connection_failure , failure , and success boxes under Automatically Terminate Relationships . Since this is the end of the flow, we can terminate everything. You could expand on this by retrying failures, or logging errors to a text file.
You should see something similar to the following:
On the
PROPERTIES tab of the PutSolrContentStream processor, we need to make a few changes.
Set the
Solr Type property to Standard . We don't need to run SolrCloud for our demo.
Set the
Solr Location to http://192.168.1.186:8983/solr/gettingstarted . You should use the IP address of your computer. When we start Solr up, we'll be using the gettingstarted collection, so it's part of the URL. If we were using SolrCloud, we put put the collection name in the Collection property instead.
The first set of properties should look similar to the following:
Now we need to add fields for indexing in Solr. Click the
+ icon in the upper right corner of the processor. The Add Property dialog will be displayed. For the first field, enter f.1 and click the ADD button. For the value enter hostname_s:/hostname . The hostname_s part of the value says to store the content in the Solr field called hostname_s , which uses the dynamic schema to treat this field as a string. The /hostname part of the value says to pull the value from the root of the JSON where the JSON node is called hostname .
We need to repeat this process for each of the other data elements coming from the Raspberry Pi. You should have the following fields defined as separate fields:
property: f.1
value: hostname_s:/hostname
property: f.2
value: timestamp_dts:/datetime
property: f.3
value: temperature_f:/temperature
property: f.4
value: pressure_f:/pressure
property: f.5
value: humidity_f:/humidity
Click the blue
APPLY button to save the changes. Connector Processors
Now that we have our processors on the canvas, we need to connect them. Drag the connection icon from the
Input Port processor to the ExtractText processor.
Drag the connection icon from the
ExtractText processor to the AttributesToJSON processor.
Drag the connection icon from the
AttributesToJSON processor to the PutSolrContentStream processor.
You should have something that looks similar to the following:
Create MiniFi flow
Now we can create our MiniFi flow. ExecuteProcess
The first thing we need to do is add a processor to execute the bash script we created on the Raspberry Pi.
Drag the
Processor icon to the canvas. Enter execute in the Filter box. You should see something similar to the following:
Select the
ExecuteProcess processor. Click on the blue ADD button to add the processor to the canvas.
Now we need to configure the
ExecuteProcess processor. Right click on the processor and select the Configure menu option.
On the
SETTINGS tab you should check the success box under Automatically Terminate Relationships . You should see something similar to the following:
On the
Scheduling tab we want to set the Run Schedule to 5 sec . This will run the processor every 5 seconds. You should see something similar to the following;
On the
Properties tab we want to set the Command to /home/pi/get_environment.sh . This assumes you created the scripts in the /home/pi directory on the Raspberry Pi.
Click the blue
APPLY button to save the changes. Remote Process Group
Now we need to add a
Remote Process Group to our canvas. This is how the MiniFi flow is able to send data to Nifi. Drag the Remote Process Group icon to the canvas.
For the
URL enter the URL you use to access your NiFi UI. In my case that is http://192.168.1.186:9090/nifi . Remember the default port for NiFi is 8080 . For the Transport Protocol select HTTP . You can leave the other settings as defaults. You should see something similar to the following:
Click the blue
ADD button to add the Remote Process Group to the canvas. Create Connection
Now we need to create a connection between our ExecuteProcess processor and our Remote Process Group on the canvas.
Hover your mouse over the
ExecuteProcess processor. Click on the circle arrow icon and drag from the processor to the Remote Process Group . Save Template
We need to save the MiniFi portion of the flow as a template. Select the
ExecuteProcess , Remote Process Group and the connection between them using the shift key to allow multi-select.
Click on the
Create Template icon (second icon from the right on the top row) in the Operate Box on the canvas. It looks like the following:
The
Create Template dialog will be displayed. Give your template a name. I used rasbperrypi and click the blue CREATE button.
Now click on the main Nifi Menu button in the upper right corner of the UI. You should see something like the following:
Now click the
Templates options. This will open the NiFi Templates dialog. You will see a list of templates you have created. You should see something similar to the following:
Now find the template you just created and click on the
Download button on the right hand side. This will save a copy of the flowfile in xml format on your local computer. Convert NiFi Flow to MiniFi Flow
We need to convert the xml flowfile NiFi generated into a yml file that MiniFi uses. We will be using the minifi-toolkit to do this.
We need run the minifi-toolkit transform command. The first option is the location of the NiFi flowfile you downloaded. The second option is the location where to write out the MiniFi flowfile. MiniFi expects the flowfile name to be
config.yml
Run the transform command. You should see something similar to the following:
$ /Users/myoung/Downloads/minifi-toolkit-0.1.0/bin/config.sh transform ~/Downloads/raspberry.xml ~/Downloads/config.yml
Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home
MiNiFi Toolkit home: /Users/myoung/Downloads/minifi-toolkit-0.1.0
No validation errors found in converted configuration.
Copy MiniFi Flow to Raspberry Pi
Now we need to copy the flowfile to the Raspberry pi. You can easily do that using the
scp command. The config.yml file we generated needs to go in the /home/pi/minifi-1.0.2.1.1.0-2/conf/ directory.
You should see something similar to the following:
$ scp ~/Downloads/minifi.yml pi@raspberrypi:/home/pi/minifi-1.0.2.1.1.0-2/conf/config.yml
pi@raspberrypi's password:
minifi.yml 100% 1962 721.1KB/s 00:00
Start MiniFi
Now that the flowfile is in place, we can start MiniFi. You do that using the
minifi.sh script with the start option. Remember that MiniFi will be running on the Raspberry Pi, not on your computer.
You should see something similar to the following:
$ /home/pi/minifi-1.0.2.1.1.0-2/minifi.sh start
minifi.sh: JAVA_HOME not set; results may vary
Bootstrap Classpath: /home/pi/minifi-1.0.2.1.1.0-2/conf:/home/pi/minifi-1.0.2.1.1.0-2/lib/bootstrap/*:/home/pi/minifi-1.0.2.1.1.0-2/lib/*
Java home:
MiNiFi home: /home/pi/minifi-1.0.2.1.1.0-2
Bootstrap Config File: /home/pi/minifi-1.0.2.1.1.0-2/conf/bootstrap.conf
Now MiniFi should be running on your Raspberry Pi. If you run into any issues, look at the logs in <minifi directory>/logs/minifi-app.log. Start NiFi flow
Now that everything else is in place, we should be able to start our NiFi flow. Start the 4 NiFi processors, not the two MiniFi parts of the flow. If everything is working properly, you should start seeing records in Solr. Dashboard
You can easily add Banana to Solr to create a dashboard. Here is an example: Review
If you successfully followed along with this tutorial, you should have MiniFi collecting data from your Sense HAT sensor on your Raspberry Pi. The MiniFi flow should be sending that data to NiFi on your computer which then sends to the data to Solr.
... View more
Labels:
12-07-2016
01:26 AM
3 Kudos
Objective If you are managing multiple copies of the HDP sandbox for Docker (see my article here:How to manage multiple copies of the HDP Docker Sandbox.), you may find yourself running out of storage within your Docker VM image on your laptop. There is a way to increase the available storage space of the Docker VM image. Increasing the storage space will allow you to have more copies of sandbox containers in addition to other images and containers. This tutorial will guide you through the process of increasing the size of the base Docker for Mac VM image. We will increase the size from 64GB to 120GB. This tutorial is the second in a two part series. The first tutorial in the series is: How to move Docker for Mac vm image from internal to external hard drive. For a tutorial on increasing the base Docker VM image on CentOS 7, read my tutorial here: How to modify the default Docker configuration on CentOS 7 to import HDP sandbox Prerequisites You should have already completed the following tutorial Installing Docker Version of Sandbox on Mac You should have already completed the following tutorial How to move Docker for Mac vm image from internal to external hard drive You should have already installed Homebrew Homebrew You should have an external hard drive available. Scope Mac OS X 10.11.6 (El Capitan) Docker for Mac 1.12.1 HDP 2.5 Docker Sandbox Homebrew 1.1.0 Steps Install qemu The Docker virtual machine image is a qcow2 format, which requires qemu to manage. Before we can manipulate our image file, we need to install qemu.
brew install qemu brew install qemu
==> Installing dependencies for qemu: jpeg, libpng, libtasn1, gmp, nettle, gnutls, gettext, libffi, pcre, glib, pixman
==> Installing qemu dependency: jpeg
==> Downloading https://homebrew.bintray.com/bottles/jpeg-8d.el_capitan.bottle.2.tar.gz
######################################################################## 100.0%
==> Pouring jpeg-8d.el_capitan.bottle.2.tar.gz
/usr/local/Cellar/jpeg/8d: 19 files, 713.8K
==> Installing qemu dependency: libpng
==> Downloading https://homebrew.bintray.com/bottles/libpng-1.6.26.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring libpng-1.6.26.el_capitan.bottle.tar.gz
/usr/local/Cellar/libpng/1.6.26: 26 files, 1.2M
==> Installing qemu dependency: libtasn1
==> Downloading https://homebrew.bintray.com/bottles/libtasn1-4.9.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring libtasn1-4.9.el_capitan.bottle.tar.gz
/usr/local/Cellar/libtasn1/4.9: 58 files, 437K
==> Installing qemu dependency: gmp
==> Downloading https://homebrew.bintray.com/bottles/gmp-6.1.1.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring gmp-6.1.1.el_capitan.bottle.tar.gz
/usr/local/Cellar/gmp/6.1.1: 17 files, 3.2M
==> Installing qemu dependency: nettle
==> Downloading https://homebrew.bintray.com/bottles/nettle-3.3.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring nettle-3.3.el_capitan.bottle.tar.gz
/usr/local/Cellar/nettle/3.3: 81 files, 2.0M
==> Installing qemu dependency: gnutls
==> Downloading https://homebrew.bintray.com/bottles/gnutls-3.4.16.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring gnutls-3.4.16.el_capitan.bottle.tar.gz
==> Using the sandbox
/usr/local/Cellar/gnutls/3.4.16: 1,115 files, 6.9M
==> Installing qemu dependency: gettext
==> Downloading https://homebrew.bintray.com/bottles/gettext-0.19.8.1.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring gettext-0.19.8.1.el_capitan.bottle.tar.gz
==> Caveats
This formula is keg-only, which means it was not symlinked into /usr/local.
macOS provides the BSD gettext library and some software gets confused if both are in the library path.
Generally there are no consequences of this for you. If you build your
own software and it requires this formula, you'll need to add to your
build variables:
LDFLAGS: -L/usr/local/opt/gettext/lib
CPPFLAGS: -I/usr/local/opt/gettext/include
==> Summary
/usr/local/Cellar/gettext/0.19.8.1: 1,934 files, 16.9M
==> Installing qemu dependency: libffi
==> Downloading https://homebrew.bintray.com/bottles/libffi-3.0.13.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring libffi-3.0.13.el_capitan.bottle.tar.gz
==> Caveats
This formula is keg-only, which means it was not symlinked into /usr/local.
Some formulae require a newer version of libffi.
Generally there are no consequences of this for you. If you build your
own software and it requires this formula, you'll need to add to your
build variables:
LDFLAGS: -L/usr/local/opt/libffi/lib
==> Summary
/usr/local/Cellar/libffi/3.0.13: 15 files, 374.7K
==> Installing qemu dependency: pcre
==> Downloading https://homebrew.bintray.com/bottles/pcre-8.39.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring pcre-8.39.el_capitan.bottle.tar.gz
/usr/local/Cellar/pcre/8.39: 203 files, 5.4M
==> Installing qemu dependency: glib
==> Downloading https://homebrew.bintray.com/bottles/glib-2.50.1.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring glib-2.50.1.el_capitan.bottle.tar.gz
/usr/local/Cellar/glib/2.50.1: 427 files, 22.3M
==> Installing qemu dependency: pixman
==> Downloading https://homebrew.bintray.com/bottles/pixman-0.34.0.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring pixman-0.34.0.el_capitan.bottle.tar.gz
/usr/local/Cellar/pixman/0.34.0: 12 files, 1.2M
==> Installing qemu
==> Downloading https://homebrew.bintray.com/bottles/qemu-2.7.0.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring qemu-2.7.0.el_capitan.bottle.tar.gz
/usr/local/Cellar/qemu/2.7.0: 126 files, 139.8M NOTE: This may take several minutes. Export Docker containers Before we make any changes to our Docker vm image, we should backup any containers we want to save. This is not a typical Docker use case as most Docker containers are emphemeral. However, we want to save any configuration changes we've made to our sandbox containers. To get a list of containers use the docker ps -a command. This will show all containers, running or not. You should see something similar to this:
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
15411dc968ad fc813bdc4bdd "/usr/sbin/sshd -D" 4 weeks ago Exited (0) 23 hours ago hdp25-atlas-demo As you can see I have a single container which I use for Atlas demos. If you followed my article above for managing multiple copies of the sandbox, you will note the name of the sandbox is based on my project directory. If you have not followed that tutorial, then the NAMES column will display sandbox which is the default container name. I want to save that container to avoid having to redo any configuration and setup tasks that I've already completed. Using the docker export command, we can create a saved image of our container. This image can be imported into Docker later. To read more about the docker export command look here: docker export. You should give the --output parameter a filename that makes sense. You should see something similar to this:
cd ~
docker export --output="hdp25-atlas-demo.tar" hdp25-atlas-demo NOTE: This may take several minutes. You can check the size of the container export. You should see something like this:
ls -lah hdp25-atlas-demo.tar
-rw------- 1 myoung staff 14G Nov 9 11:57 hdp25-atlas-demo.tar You shouldn't need to save your Docker images as they can easily be imported again and contain no custom configurations. Stop Docker for Mac Let check the storage size available with our current virtual machine image.
docker run --rm alpine df -h
Filesystem Size Used Available Use% Mounted on
none 59.0G 35.7G 20.3G 64% /
tmpfs 5.9G 0 5.9G 0% /dev
tmpfs 5.9G 0 5.9G 0% /sys/fs/cgroup
/dev/vda2 59.0G 35.7G 20.3G 64% /etc/resolv.conf
/dev/vda2 59.0G 35.7G 20.3G 64% /etc/hostname
/dev/vda2 59.0G 35.7G 20.3G 64% /etc/hosts
shm 64.0M 0 64.0M 0% /dev/shm
tmpfs 5.9G 0 5.9G 0% /proc/kcore
tmpfs 5.9G 0 5.9G 0% /proc/timer_list
tmpfs 5.9G 0 5.9G 0% /proc/sched_debug Notice that our / partition is 59GB in size. Another 5.9GB is used for the other partitions. That brings us up to our 64GB file size. Before we can make any changes to the Docker virtual machine image, we need to stop Docker for Mac. There should be a Docker for Mac icon in the menu bar. You should see something similar to this:
You can also check via the command line via the ps -ef | grep -i com.docker . You should see something similar to this:
ps -ef | grep -i com.docker
0 123 1 0 8:45AM ?? 0:00.01 /Library/PrivilegedHelperTools/com.docker.vmnetd
502 967 876 0 8:46AM ?? 0:00.08 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 969 967 0 8:46AM ?? 0:00.04 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 971 967 0 8:46AM ?? 0:07.96 com.docker.db --url fd:3 --git /Users/myoung/Library/Containers/com.docker.docker/Data/database
502 975 967 0 8:46AM ?? 0:03.40 com.docker.osx.hyperkit.linux
502 977 975 0 8:46AM ?? 0:00.03 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux
502 12807 967 0 9:17PM ?? 0:00.08 com.docker.osxfs --address fd:3 --connect /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --control fd:4 --volume-control fd:5 --database /Users/myoung/Library/Containers/com.docker.docker/Data/s40
502 12810 967 0 9:17PM ?? 0:00.12 com.docker.slirp --db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 --ethernet fd:3 --port fd:4 --vsock-path /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --max-connections 900
502 12811 967 0 9:17PM ?? 0:00.19 com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 12812 12811 0 9:17PM ?? 0:00.02 /Applications/Docker.app/Contents/MacOS/com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 12814 12811 0 9:17PM ?? 0:16.48 /Applications/Docker.app/Contents/MacOS/com.docker.hyperkit -A -m 12G -c 6 -u -s 0:0,hostbridge -s 31,lpc -s 2:0,virtio-vpnkit,uuid=1f629fed-1ef6-4f34-8fce-753347e3b941,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s50,macfile=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/mac.0 -s 3,virtio-blk,file:///Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2,format=qcow -s 4,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s40,tag=db -s 5,virtio-rnd -s 6,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s51,tag=port -s 7,virtio-sock,guest_cid=3,path=/Users/myoung/Library/Containers/com.docker.docker/Data,guest_forwards=2376;1525 -l com1,autopty=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty,log=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/console-ring -f kexec,/Applications/Docker.app/Contents/Resources/moby/vmlinuz64,/Applications/Docker.app/Contents/Resources/moby/initrd.img,earlyprintk=serial console=ttyS0 com.docker.driver="com.docker.driver.amd64-linux", com.docker.database="com.docker.driver.amd64-linux" ntp=gateway mobyplatform=mac -F /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/hypervisor.pid
502 13790 876 0 9:52PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 13791 13790 0 9:52PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 13793 13146 0 9:52PM ttys000 0:00.00 grep -i com.docker Now we will stop Docker for Mac. Using the menu shown above, click on the Quit Docker menu option. This will stop Docker for Mac. You should notice the Docker for Mac icon is no longer visible. Now let's confirm the Docker processes we saw before are no longer running.
ps -ef | grep -i com.docker
0 123 1 0 8:45AM ?? 0:00.01 /Library/PrivilegedHelperTools/com.docker.vmnetd
502 13815 13146 0 9:54PM ttys000 0:00.00 grep -i com.docker NOTE: It may take a few seconds before Docker for Mac is completely stopped. It is ok for the com.docker.vmnetd to still be running. Create new Docker template image Docker uses a template image file. That file is copied from /Application/Docker.app/Contents/Resources/moby/data.qcow2 to ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2 when Docker sees you are missing the Docker.qcow2 file. We are going to create a new template image file. In my case I want to create a file that is 120GB in size. You should see something similar to this:
cd ~
qemu-img create -f qcow2 ~/data.qcow2 120G
Formatting '/Users/myoung/data.qcow2', fmt=qcow2 size=128849018880 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 Now let's see how big the file is. You should see something similar to this:
ls -lah data.qcow2
-rw-r--r-- 1 myoung staff 194K Nov 9 19:30 data.qcow2 That is interesting; the file is only 194KB in size. Why? That is because the qcow2 image format is a sparse file. The image file grows as you add content to it, such as images and containers. You can read more here Sparse Files. The file will continue to grow until it reaches 120GB in size. NOTE: On Linux systems, this process is more involved than it is for the Mac. You have to change the Docker configuration to define a new setting --storage-opt=dm.basesize=30G, or whatever size is appropriate for your environment. There is a link to my Linux article on this topic at the top of the page. Backup default template image file We should backup our default template image file just to be safe.
mv /Application/Docker.app/Contents/Resources/moby/data.qcow2 /Application/Docker.app/Contents/Resources/moby/data.qcow2.backup Let's make sure the file exists.
ls -lah /Applications/Docker.app/Contents/Resources/moby/data.qcow2.backup
-rw-r--r-- 1 myoung admin 320K Nov 8 12:36 /Applications/Docker.app/Contents/Resources/moby/data.qcow2.backup Now we can cp our new template file into place.
cp data.qcow2 /Applications/Docker.app/Contents/Resources/moby/data.qcow2 Again, let's make sure the file exists.
ls -lah /Applications/Docker.app/Contents/Resources/moby/data.qcow2*
-rw-r--r-- 1 myoung admin 194K Nov 9 19:38 /Applications/Docker.app/Contents/Resources/moby/data.qcow2
-rw-r--r-- 1 myoung admin 320K Nov 8 12:36 /Applications/Docker.app/Contents/Resources/moby/data.qcow2.backup Delete current Docker vm image file Now we need delete the current Docker vm image file located at ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2 . We need to do this because we need to create a new image file using the template baseline image we just created. We backed up our containers in previous steps, so we shouldn't have to worry about losing anything. You can decide to backup this file just to be safe.
mv ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2 ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2.backup NOTE: You must have sufficient storage space to hold two copies of the Docker.qcow2 file. Now let's make sure the file has been moved.
ls -lah ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2*
-rw-r--r-- 1 myoung staff 64G Nov 9 19:30 /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2.backup Now restart Docker for Mac Now we can restart Docker for Mac. This is done by running the application from the Applications folder in the Finder. You should see something similar to this:
Double-click on the Docker application to start it. You should notice the Docker for Mac icon is now back in the menu menu bar. You can also check via ps -ef | grep -i com.docker . You should see something similar to this:
ps -ef | grep -i com.docker
0 123 1 0 8:45AM ?? 0:00.01 /Library/PrivilegedHelperTools/com.docker.vmnetd
502 14476 14465 0 10:42PM ?? 0:00.03 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 14479 14476 0 10:42PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 14480 14476 0 10:42PM ?? 0:00.29 com.docker.db --url fd:3 --git /Users/myoung/Library/Containers/com.docker.docker/Data/database
502 14481 14476 0 10:42PM ?? 0:00.08 com.docker.osxfs --address fd:3 --connect /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --control fd:4 --volume-control fd:5 --database /Users/myoung/Library/Containers/com.docker.docker/Data/s40
502 14482 14476 0 10:42PM ?? 0:00.04 com.docker.slirp --db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 --ethernet fd:3 --port fd:4 --vsock-path /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --max-connections 900
502 14483 14476 0 10:42PM ?? 0:00.05 com.docker.osx.hyperkit.linux
502 14484 14476 0 10:42PM ?? 0:00.08 com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 14485 14483 0 10:42PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux
502 14486 14484 0 10:42PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 14488 14484 0 10:42PM ?? 0:07.90 /Applications/Docker.app/Contents/MacOS/com.docker.hyperkit -A -m 12G -c 6 -u -s 0:0,hostbridge -s 31,lpc -s 2:0,virtio-vpnkit,uuid=1f629fed-1ef6-4f34-8fce-753347e3b941,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s50,macfile=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/mac.0 -s 3,virtio-blk,file:///Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2,format=qcow -s 4,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s40,tag=db -s 5,virtio-rnd -s 6,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s51,tag=port -s 7,virtio-sock,guest_cid=3,path=/Users/myoung/Library/Containers/com.docker.docker/Data,guest_forwards=2376;1525 -l com1,autopty=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty,log=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/console-ring -f kexec,/Applications/Docker.app/Contents/Resources/moby/vmlinuz64,/Applications/Docker.app/Contents/Resources/moby/initrd.img,earlyprintk=serial console=ttyS0 com.docker.driver="com.docker.driver.amd64-linux", com.docker.database="com.docker.driver.amd64-linux" ntp=gateway mobyplatform=mac -F /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/hypervisor.pid
502 14559 14465 0 10:46PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 14560 14559 0 10:46PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 14562 13146 0 10:46PM ttys000 0:00.00 grep -i com.docker You should notice the Docker processes are running again. You can also check the timestamp of files in the Docker image directory:
ls -lah ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2*
-rw-r--r-- 1 myoung staff 179M Nov 9 19:46 /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2
-rw-r--r-- 1 myoung staff 64G Nov 9 19:45 /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2.backup You should notice our backup file exists and is 64GB. The new file was created and is 179MB. As mentioned above, the new file will continue to grow as you use Docker. Let's check our available disk space in our Docker vm.
docker run --rm alpine df -h
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
3690ec4760f9: Pull complete
Digest: sha256:1354db23ff5478120c980eca1611a51c9f2b88b61f24283ee8200bf9a54f2e5c
Status: Downloaded newer image for alpine:latest
Filesystem Size Used Available Use% Mounted on
none 114.1G 66.7M 108.2G 0% /
tmpfs 5.9G 0 5.9G 0% /dev
tmpfs 5.9G 0 5.9G 0% /sys/fs/cgroup
/dev/vda2 114.1G 66.7M 108.2G 0% /etc/resolv.conf
/dev/vda2 114.1G 66.7M 108.2G 0% /etc/hostname
/dev/vda2 114.1G 66.7M 108.2G 0% /etc/hosts
shm 64.0M 0 64.0M 0% /dev/shm
tmpfs 5.9G 0 5.9G 0% /proc/kcore
tmpfs 5.9G 0 5.9G 0% /proc/timer_list
tmpfs 5.9G 0 5.9G 0% /proc/sched_debug Notice the image for alpine:latest didn't exist and it was downloaded. That is because we are using a new vm image. Also notice we have 114.1GB size of our / partition. And we have another 5.9GB in use for the other partitions. That brings our total space up to 120GB. Import Docker container Now that Docker for Mac is running, we need to import the saved container images we created from earlier. We will use the docker import command to do this. To read more about the docker import command look here: docker import.
cd ~
docker import hdp25-atlas-demo.tar atlas-demo:latest
sha256:c793528e25bd95564c8cbf4b1487c47107d0cc300b1929b8a0fdeff93ed84eae NOTE: This may take up to an hour for this first import. Let's look at our list of images now.
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
atlas-demo latest ad47bef4e526 2 minutes ago 13.93 GB
alpine latest baa5d63471ea 3 weeks ago 4.803 MB You should notice the repository and tag line up with what we provided to the import command. Create new sandbox container Our saved container was imported as an image. If your container was created using project directories based on this tutorial How to manage multiple copies of the HDP Docker Sandbox, then you will need to make a change before you can start the container. You need to modify the create-container.sh script. That script is creating a container based on an image called sandbox . You need to modify that script to instead create a container based on an image called atlas-demo . This assumes you called your repository atlas-demo when you imported the image. Here is a copy of my modified create-container.sh script:
docker run -v `pwd`:/mount -v ${PROJ_DIR}:/hadoop --name ${PROJ_DIR} --hostname "sandbox.hortonworks.com" --privileged -d -p 6080:6080 -p 9090:9090 -p 9000:9000 -p 8000:8000 -p 8020:8020 -p 42111:42111 -p 10500:10500 -p 16030:16030 -p 8042:8042 -p 8040:8040 -p 2100:2100 -p 4200:4200 -p 4040:4040 -p 8050:8050 -p 9996:9996 -p 9995:9995 -p 8080:8080 -p 8088:8088 -p 8886:8886 -p 8889:8889 -p 8443:8443 -p 8744:8744 -p 8888:8888 -p 8188:8188 -p 8983:8983 -p 1000:1000 -p 1100:1100 -p 11000:11000 -p 10001:10001 -p 15000:15000 -p 10000:10000 -p 8993:8993 -p 1988:1988 -p 5007:5007 -p 50070:50070 -p 19888:19888 -p 16010:16010 -p 50111:50111 -p 50075:50075 -p 50095:50095 -p 18080:18080 -p 60000:60000 -p 8090:8090 -p 8091:8091 -p 8005:8005 -p 8086:8086 -p 8082:8082 -p 60080:60080 -p 8765:8765 -p 5011:5011 -p 6001:6001 -p 6003:6003 -p 6008:6008 -p 1220:1220 -p 21000:21000 -p 6188:6188 -p 61888:61888 -p 2181:2181 -p 2222:22 atlas-demo /usr/sbin/sshd -D Notice the last line uses atlas-demo instead of sandbox . Once you run this command, your container will be created and running. You do not need to modify the start-container.sh , ssh-container.sh or stop-container.sh script. They all use the ${PROJ_DIR} which is one of the reasons these scripts are so handy. Connect to the sandbox Now that the container is started, we can connect to it. We can use our helper script ssh-container.sh to make it easy:
./ssh-container.sh When you created a new sandbox container before, you were prompted to reset the root password. You were not prompted this time, because our existing configuration is part the image now. Start the sandbox processes When the container starts up, it doesn't automatically start the sandbox processes. You can do that by running the /etc/inid./startup_script . You should see something similar to this:
[root@sandbox ~]# /etc/init.d/startup_script start
Starting tutorials... [ Ok ]
Starting startup_script...
Starting HDP ...
Starting mysql [ OK ]
Starting Flume [ OK ]
Starting Postgre SQL [ OK ]
Starting Ranger-admin [WARNINGS]
find: failed to restore initial working directory: Permission denied
Starting data node [ OK ]
Starting name node [ OK ]
Safe mode is OFF
Starting Oozie [ OK ]
Starting Ranger-usersync [ OK ]
Starting Zookeeper nodes [ OK ]
Starting NFS portmap [ OK ]
Starting Hdfs nfs [ OK ]
Starting Hive server [ OK ]
Starting Hiveserver2 [ OK ]
Starting Ambari server [ OK ]
Starting Ambari agent [ OK ]
Starting Node manager [ OK ]
Starting Yarn history server [ OK ]
Starting Webhcat server [ OK ]
Starting Spark [ OK ]
Starting Mapred history server [ OK ]
Starting Zeppelin [ OK ]
Starting Resource manager [ OK ]
Safe mode is OFF
Starting sandbox...
/etc/init.d/startup_script: line 97: /proc/sys/kernel/hung_task_timeout_secs: No such file or directory
Starting shellinaboxd: [ OK ] NOTE: You can ignore any warnings or errors that are displayed. Now the sandbox process are running and you can access the Ambari interface if http://localhost:8080 . Log in with the raj_ops username and password. You should notice all of the configurations you previously had are still there. In my case I turned Zeppelin maintenance mode on and maintenance mode off for some other services. They were as I expected them. Create new sandbox containers If you want to create new sandbox containers using the baseline image, make sure you have imported the default sandbox image. Then create those containers using sandbox instead of the atlas-demo label. Review If you successfully followed along with this tutorial, we were able backup our existing containers to images. We created a new baseline Docker vm image increasing our storage space from 64GB to 120GB. We deleted our existing Docker vm image and recreated it using the new baseline image. We imported our saved containers as new images. And finally we were able to create new containers using the saved container image which contained all of your previous settings and configurations.
... View more
Labels:
11-30-2016
10:18 PM
@Sunile Manjee Great article. I think you have a typo in your run command. You use sunileman/dockernifi in the pull, but sunileman/nifi in the run. The run command doesn't work as you have it shown.
... View more
11-12-2016
01:25 AM
@Jonas Straub
The solrconfig.xml portion you provide doesn't work when I try to create a collection using the data driven schema configs. The problem seems to be the
<processor> entries. I get null pointer exceptions.
The first processor entry should be <processor class="solr.DefaultValueUpdateProcessorFactory">. The second processor entry should be <processor class="solr.processor.DocExpirationUpdateProcessorFactory">.
... View more
11-11-2016
12:19 AM
12 Kudos
Objective
The Hortonworks Sandbox for HDP 2.5 now uses Docker containers, even the VirtualBox version. The process for exposing extra ports on older versions of the sandbox was as simple as setting up additional port forwarding rules in VirtualBox. The new container version of the sandbox requires additional steps; you have to do more than just setup port forwarding rules in VirtualBox.
This tutorial will guide you through the process of adding additional ports to the VirtualBox version of the HDP 2.5 sandbox. Prerequisites
You should have already downloaded and installed the VirtualBox version of the Hortonworks Sandbox for HDP 2.5 Hortonworks Sandbox Scope
Mac OS X 10.11.6 (El Capitan)
VirtualBox 5.1.6
HDP 2.5 VirtualBox Sandbox Steps Startup the Sandbox VM
Startup your HDP sandbox. It should be called Hortonworks Docker Sandbox within VirtualBox. You should see something similar to this:
You can start the sandbox by either double-clicking on the virtual machine or by selecting the virtual machine and clicking on the start icon in the menu. Once the virtual machine is started, you should see something similar to this:
NOTE: It may take several minutes for the virtual machine to start. Login to the Sandbox VM
We need to login to the VM that runs the sandbox container. If you use the standard ssh -p 2222 root@localhost , you will actually login to the sandbox container, not the sandbox VM. You can login to the sandbox VM using ssh -p 2122 root@localhost . You can also login to the sandbox VM directly using the VirtualBox VM window by following the instructions on the VM window. On the Mac you click in the VM window and press the Alt/Option + F5 keys. You should see something like this:
The username is root and the password is hadoop . Disable sandbox.service
The sandbox Docker container is set to autostart when the VM starts. I've run into issues trying to top the Docker container. So we'll use a workaround to temporarily disable the sandbox.service. The sandbox VM is based on CentOS 7 and uses systemd.
systemctl disable sandbox.service
Reboot the VM
Now we need to reboot the VM so we can get it started without the sandbox container running. You can do this easily using the init command.
init 6
The VM should reboot. Modify sandbox start script
After the VM reboots, login again using the steps provided above. The sandbox start script is located at /root/start_scripts/start_sandbox.sh . This script has the docker run command which creates or starts the Docker container. We need to edit this script to add our additional ports. For the purposes of this tutorial, we'll add port 8440 which is used by Ambari agents to talk to Ambari server.
vi /root/start_scripts/start_sandbox.sh
Scroll down until you see:
-p 2222:22 sandbox /usr/sbin/sshd -D
We need to add a line after the last port entry. Modify the script so it now looks like this:
-p 2222:22 -p 8440:8440 sandbox /usr/sbin/sshd -D
Now save the file
Press ESC key
:wq!
Delete existing sandbox container
Since we booted up the VM at least once, the sandbox container will already exist. The startup script will not and cannot update an existing container to add another port. So we need to remove the existing sandbox container. The startup script will create a new container with the added ports. Remember with Docker that a Docker image is like a blueprint to the building and a Docker container is the building created from the blueprint. We will delete the container, not the image.
docker rm sandbox
NOTE: This will remove any changes you've made to your Docker container via Ambari, etc. If you want to save any changes you've made, you can update the base sandbox Docker image using docker commit . To update the base sandbox image with any configuration changes you've done use docker commit sandbox sandbox . Enable sandbox.service
Now we need to enable the sandbox service so it auto starts when the VM boots up.
systemctl enable sandbox.service
Reboot the VM
Now it's time to reboot the VM. As before, we'll use the init command
init 6
Verify new ports
After the VM reboots, login again using the steps provided above. We are going to show the running Docker container using the docker ps command.
docker ps
You should notice the standard Docker output from that command. Look for the port 8440 . You should see it in the list. You should see something similar to this:
NOTE: We added the port to the end of the startup script. However, the port will not be displayed at the end of the list. The port will be somewhere in the middle of the list of ports. Update VirtualBox VM ports
Now we need to update the forwarded ports configuration for our VirtualBox VM. Using the VirtualBox user interface, right-click on the Hortonworks Docker Sandbox to display the menu. You should see something similar to this:
Now click the Settings menu option. You should see something similar to this:
Now click on the Network menu icon. You should see something similar to this:
Now click the Advanced dropdown menu near the bottom. You should see something similar to this:
Now click the Port Forwarding button. You should see something similar to this:
Now click the + icon near the upper right to add another port forwarding rule. You should see something similar to this:
Now add an entry with the following info:
Click the blue OK button. Click the OK button. The settings should be saved.
Now port 8440 should be accessible on your computer which is forwarded to 8440 on the VirtualBox VM which is then passed to the Docker container. Review
If you successfully followed along with this tutorial, we were able modify the Docker startup script for our sandbox container with the VirtualBox VM to add another port. After removing the Docker container and restarting the VM, a new Docker container was created with our additional port.
... View more
Labels: