Member since
02-09-2016
559
Posts
422
Kudos Received
98
Solutions
05-18-2017
03:00 PM
6 Kudos
Prerequisites
You should already have a Cloudbreak v1.14.4 environment running. You can follow this article to create a Cloudbreak instance using Vagrant and Virtualbox: HCC Article
You should already have credentials created in Cloudbreak for deploying on AWS (or Azure).
Scope
This tutorial was tested in the following environment:
macOS Sierra (version 10.12.4)
Cloudbreak 1.14.4
AWS EC2
NOTE: Cloudbreak 1.14.0 (TP) had a bug which caused HDP 2.6 clusters installs to fail. You should upgrade your Cloudbreak deployer instance to 1.14.4.
Steps
Create application.yml file
UPDATE 05/24/2017: The creation of a custom application.yml file is not required with Cloudbreak 1.14.4. This version of Cloudbreak includes support for HDP 2.5 and HDP 2.6. This step remains for educational purposes for future HDP updates.
You need to create an application.yml file in the etc directory within your Cloudbreak deployment directory. This file will contain the repo information for HDP 2.6. If you followed my tutorial linked above, then your Cloudbreak deployment directory should be /opt/cloudbreak-deployment . If you are using a Cloudbreak instance on AWS or Azure, then your Cloudbreak deployment directory is likely /var/lib/cloudbreak-deployment/ .
Edit your <cloudbreak-deployment>/etc/application.yml file using your favorite editor. Copy and paste the following in the file:
cb:
ambari:
repo:
version: 2.5.0.3-7
baseurl: http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.5.0.3
gpgkey: http://public-repo-1.hortonworks.com/ambari/centos6/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
database:
vendor: embedded
host: localhost
port: 5432
name: postgres
username: ambari
password: bigdata
</p>
<p>
hdp:
entries:
2.5:
version: 2.5.0.1-210
repoid: HDP-2.5
repo:
stack:
repoid: HDP-2.5
redhat6: http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.5.5.0
redhat7: http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.5.5.0
util:
repoid: HDP-UTILS-1.1.0.21
redhat6: http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos6
redhat7: http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos7
2.6:
version: 2.6.0.0-598
repoid: HDP-2.6
repo:
stack:
repoid: HDP-2.6
redhat6: http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.6.0.3
redhat7: http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.6.0.3
util:
repoid: HDP-UTILS-1.1.0.21
redhat6: http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos6
redhat7: http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos7
Start Cloudbreak
Once you have created your application.yml file, you can start Cloudbreak.
$ cbd start
NOTE: It may take a couple of minutes before Cloudbreak is fully running.
Create HDP 2.6 Blueprint
To create an HDP 2.6 cluster, we need to update our blueprint to specify HDP 2.6. On the main Cloudbreak UI, click on manage blueprints . You should see something similar to this:
You should see 3 default blueprints. We are going to use the hdp-small-default blueprint as our base. Click on the hdp-small-default blueprint name. You should see something similar to this:
Now click on the blue copy & edit button. You should see something similar to this:
For the Name , you should enter something unqiue and descriptive. I suggest hdp26-small-default . For the Description , you can enter the same information. You should see something similar to this:
Now we need to edit the JSON portion of the blueprint. Scroll down to the bottom of the JSON. You should see something similar to this:
Now edit the blueprint_name value to be hdp26-small-default and edit the stack_version to be 2.6 . You should see something similar to this:
Now click on the green create blueprint button. You should see the new blueprint visible in the list of blueprints.
Create HDP 2.6 Small Default Cluster
Now that our blueprint has been created, we can create a cluster and select this blueprint to install HDP 2.6. Select the appropriate credential for your Cloud environment. Click on the create cluster button. You should see something similar to this:
Provide a unique, but descriptive Cluster Name . Ensure you select an appropriate Region . I chose hdp26test as my cluster name and I'm using the US East region:
Now advanced to the next step by clicking on Setup Network and Security . You should see something similar to this:
We don't need to make any changes here, so click on the Choose Blueprint button. You should see something similar to this:
In the Blueprint dropdown, you should see the blueprint we created. Select the hdp26-small-default blueprint. You should see something similar to this:
You need to select which node Ambari will run on. I typically select the master1 node. You should see something similar to this:
Now you can click on the Review and Launch button. You should see something similar to this:
Verify the information presented. If everything looks good, click on the create and start cluster button . Once the cluster build process has started, you should see something similar to this:
Verify HDP Version
Once the cluster has finished building, you can click on the cluster in the Cloudbreak UI. You should see something similar to this:
Click on the Ambari link to load Ambari. Login using the default username and password of admin . Now click on the Admin link in the menu. You should see something similar to this:
Click on the Stack and Versions link. You should see something similar to this:
You should notice that HDP 2.6.0.3 has been deployed.
Review
If you have successfully followed along with this tutorial, you should know how to create/update /etc/application.yml to add specific Ambair and HDP repositories. You should have successfully created an updated blueprint and deployed HDP 2.6 on your cloud of choice.
... View more
05-13-2017
01:57 AM
5 Kudos
Objectives
This tutorial will walk you through the process of using Cloudbreak recipes to install Anaconda on an your HDP cluster during cluster provisioning. This process can be used to automate many tasks on the cluster both pre-install and post-install.
Prerequisites
You should already have a Cloudbreak v1.14.0 environment running. You can follow this article to create a Cloudbreak instance using Vagrant and Virtualbox: HCC Article
You should already have credentials created in Cloudbreak for deploying on AWS (or Azure).
Scope
This tutorial was tested in the following environment:
macOS Sierra (version 10.12.4)
Cloudbreak 1.14.0 TP
AWS EC2
Anaconda 2.7.13
Steps
Create Recipe
Before you can use a recipe during a cluster deployment, you have to create the recipe. In the Cloudbreak UI, look for the "mange recipes" section. It should look similar to this:
If this is your first time creating a recipe, you will have 0 recipes instead of the 2 recipes show in my interface.
Now click on the arrow next to manage recipes to display available recipes. You should see something similar to this:
Now click on the green create recipe button. You should see something similar to this:
Now we can enter the information for our recipe. I'm calling this recipe anaconda . I'm giving it the description of Install Anaconda . You can choose to install Anaconda as either pre-install or post-install. I'm choosing to do the install post-install. This means the script will be run after the Ambari installation process has started. So choose the Execution Type of POST . Choose Script so we can copy and paste the shell script. You can also specify a file to upload or a URL (gist for example). Our script is very basic. We are going to download the Anaconda install script, then run it in silent mode. Here is the script:
#!/bin/bash
wget https://repo.continuum.io/archive/Anaconda2-4.3.1-Linux-x86_64.sh
bash ./Anaconda2-4.3.1-Linux-x86_64.sh -b -p /opt/anaconda
When you have finished entering all of the information, you should see something similar to this:
If everything looks good, click on the green create recipe button.
After the recipe has been created, you should see something similar to this:
Create a Cluster using a Recipe
Now that our recipe has been created, we can create a cluster that uses the recipe. Go through the process of creating a cluster up to the Choose Blueprint step. This step is when you select the recipe you want to use. The recipes are not selected by default; you have to select the recipes you wish to use. You specify recipes for 1 or more host groups. This allows you to run different recipes across different host groups (masters, slaves, etc). You can also select multiple recipes.
We want to use the ```hdp-small-default``` blueprint. This will create a basic HDP cluster.
If you select the anaconda recipe, you should see something similar to this:
[Select Recipe]( ) In our case, we are going to run the recipe on every host group. If you intend to use something like Anaconda across the cluster, you should install it on at least the slave nodes and the client nodes.
After you have selected the recipe for the host groups, click the Review & Launch button, then launch the cluster. As the cluster is building, you should see a message in the Cloudbreak UI that indicates the recipe is running. When that happens, you will see something similar to this:
Cloudbreak will create logs for each recipe that runs on each host. These logs are located at /var/log/recipe and have the name of the recipe and whether it is pre or post install. For example, our recipe log is called post-anaconda.log . You can tail this log file to following the execution of the script.
NOTE: Post install scripts won't be executed until the Ambari server is installed and the cluster is building. You can always monitor the /var/log/recipe directory on a node to see when the script is being executed. The time it takes to run the script will vary depending on the cloud environment and how long it takes to spin up the cluster.
On your cluster, you should be able to see the post-install log:
$ ls /var/log/recipes
post-anaconda.log post-hdfs-home.log
Once the install process is complete, you should be able to verify that Anaconda is installed. You need to ssh into one of the cloud instances. You can get the public ip address from the Cloudbreak UI. You will login using the corresponding private key to the public key you entered when you created the Cloudbreak credential. You should login as the cloudbreak user. You should see something similar to this:
$ ssh -i ~/Downloads/keys/cloudbreak_id_rsa cloudbreak@#.#.#.#
The authenticity of host '#.#.#.# (#.#.#.#)' can't be established.
ECDSA key fingerprint is SHA256:By1MJ2sYGB/ymA8jKBIfam1eRkDS5+DX1THA+gs8sdU.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '#.#.#.#' (ECDSA) to the list of known hosts.
Last login: Sat May 13 00:47:41 2017 from 192.175.27.2
__| __|_ )
_| ( / Amazon Linux AMI
___|\___|___|
https://aws.amazon.com/amazon-linux-ami/2016.09-release-notes/
25 package(s) needed for security, out of 61 available
Run "sudo yum update" to apply all updates.
Amazon Linux version 2017.03 is available.
Once you are on the server, you can check the version of python:
$ /opt/anaconda/bin/python --version
Python 2.7.13 :: Anaconda 4.3.1 (64-bit)
Review
If you have successfully followed along with this tutorial, you should know how to create pre and post install scripts. You should have successfully deployed a cluster on either AWS or Azure with Anaconda installed under /opt/anaconda on the nodes you specified.
... View more
Labels:
07-14-2017
02:44 PM
Answering my own question: It doesn't work with the latest version of cloudbreak (1.16.1). After login to the GUI I do get the error "Cannot retrieve csrf token" . But it does work with version 1.14.4 .
... View more
03-05-2017
06:58 PM
4 Kudos
Objective
This tutorial will walk you through the process of using Ansible to deploy Hortonworks Data Platform (HDP) on Amazon Web Services (AWS). We will use the ansible-hadoop Ansible playbook from ObjectRocket to do this. You can find more information on that playbook here: ObjectRocket Ansible-Hadoop
This tutorial is part 2 of a 2 part series. Part 1 in the series will show you how to use Ansible to create instances on Amazon Web Services (AWS). Part 1 is avaiablle here: HCC Article Part 1
This tutorial was created as a companion to the Ansible + Hadoop talk I gave at the Ansible NOVA Meetup in February 2017. You can find the slides to that talk here: SlideShare
Prerequisites
You must have an existing AWS account.
You must have access to your AWS Access and Secret keys.
You are responsible for all AWS costs incurred.
You should have 3-6 instances created in AWS. If you completed Part 1 of this series, then you have an easy way to do that.
Scope
This tutorial was tested using the following environment and components:
Mac OS X 10.11.6 and 10.12.3
Amazon Web Services
Anaconda 4.1.6 (Python 2.7.12)
Ansible 2.1.3.0
git 2.10.1
Steps
Create python virtual environment
We are going to create a Python virtual environment for installing the required Python modules. This will help eliminate module version conflicts between applications.
I prefer to use Continuum Anaconda for my Python distribution. Therefore the steps for setting up a python virtual environment will be based on that. However, you can use standard python and the virtualenv command to do something similar.
To create a virtual environment using Anaconda Python, you use the conda create command. We will name our virtual environment ansible-hadoop . The the following command: conda create --name ansible-hadoop python will create our virtual environment with the name specified. You should see something similar to the following:
$ conda create --name ansible-hadoop python
Fetching package metadata .......
Solving package specifications: ..........
Package plan for installation in environment /Users/myoung/anaconda/envs/ansible-hadoop:
The following NEW packages will be INSTALLED:
openssl: 1.0.2k-1
pip: 9.0.1-py27_1
python: 2.7.13-0
readline: 6.2-2
setuptools: 27.2.0-py27_0
sqlite: 3.13.0-0
tk: 8.5.18-0
wheel: 0.29.0-py27_0
zlib: 1.2.8-3
Proceed ([y]/n)? y
Linking packages ...
cp: /Users/myoung/anaconda/envs/ansible-hadoop:/lib/libcrypto.1.0.0.dylib: No such file or directory
mv: /Users/myoung/anaconda/envs/ansible-hadoop/lib/libcrypto.1.0.0.dylib-tmp: No such file or directory
[ COMPLETE ]|################################################################################################| 100%
#
# To activate this environment, use:
# $ source activate ansible-hadoop
#
# To deactivate this environment, use:
# $ source deactivate
#
Switch python environments
Before installing python packages for a specific development environment, you should activate the environment. This is done with the command source activate <environment> . In our case the environment is the one we just created, ansible-hadoop . You should see something similar to the following:
$ source activate ansible-hadoop
As you can see there is no output to indicate if we were successful in changing our environment.
To verify, you can use the conda info --envs command list the available environments. The active environment will have a * . You should see something similar to the following:
$ conda info --envs
# conda environments:
#
ansible-hadoop * /Users/myoung/anaconda/envs/ansible-hadoop
root /Users/myoung/anaconda
As you can see, the ansible-hadoop environment has the * which means it is the active environment.
If you want to remove your python virtual environment, you can use the following command: conda remove --name <environment> --all . If you want to remove the environment we just created you should see something similar to the following:
$ conda remove --name ansible-hadoop --all
Package plan for package removal in environment /Users/myoung/anaconda/envs/ansible-hadoop:
The following packages will be REMOVED:
openssl: 1.0.2k-1
pip: 9.0.1-py27_1
python: 2.7.13-0
readline: 6.2-2
setuptools: 27.2.0-py27_0
sqlite: 3.13.0-0
tk: 8.5.18-0
wheel: 0.29.0-py27_0
zlib: 1.2.8-3
Proceed ([y]/n)? y
Unlinking packages ...
[ COMPLETE ]|################################################################################################| 100%
HW11380:test myoung$ conda info --envs
# conda environments:
#
root * /Users/myoung/anaconda
Install Python modules in virtual environment
The ansible-hadoop playbook requires a specific version of Ansible. You need to install Ansible 2.1.3.0 before using the playbook. You can do that easily with the following command:
pip install ansible==2.1.3.0
Using a Python Virtual environment allows us to easily use Ansbile 2.1.3.0 for our playbook without impacting the default Python versions.
Clone ansible-hadoop github repo
You need to clone the ansible-hadoop github repo to a working directory on your computer. I typically do this in ~/Development.
$ cd ~/Development
$ git clone https://github.com/objectrocket/ansible-hadoop.git
You should see something similar to the following:
$ git clone https://github.com/objectrocket/ansible-hadoop.git
Cloning into 'ansible-hadoop'...
remote: Counting objects: 3879, done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 3879 (delta 1), reused 0 (delta 0), pack-reused 3873
Receiving objects: 100% (3879/3879), 6.90 MiB | 0 bytes/s, done.
Resolving deltas: 100% (2416/2416), done.
Configure ansible-hadoop
You should make the ansible-hadoop repo directory your current working directory. There are a few configuration items we need to change.
$ cd ansible-hadoop
You should already have 3-6 instances available in AWS. You will need the public IP address of those instances.
Configure ansible-hadoop/inventory/static
We need to modify the inventory/static file to include the public IP addresses of our AWS instances. We need to assign master and slave nodes in the file. The instances are all the same configuration by default, so it doesn't matter which IP addresses you put for master and slave.
The default version of the inventory/static file should look similar to the following:
[master-nodes]
master01 ansible_host=192.168.0.2 bond_ip=172.16.0.2 ansible_user=rack ansible_ssh_pass=changeme
#master02 ansible_host=192.168.0.2 bond_ip=172.16.0.2 ansible_user=root ansible_ssh_pass=changeme
[slave-nodes]
slave01 ansible_host=192.168.0.3 bond_ip=172.16.0.3 ansible_user=rack ansible_ssh_pass=changeme
slave02 ansible_host=192.168.0.4 bond_ip=172.16.0.4 ansible_user=rack ansible_ssh_pass=changeme
[edge-nodes]
#edge01 ansible_host=192.168.0.5 bond_ip=172.16.0.5 ansible_user=rack ansible_ssh_pass=changeme
I'm going to be using 6 instances in AWS. I will put 3 instances as master servers and 3 instances as slave servers. There are a couple of extra options in the default file we don't need. The only values we need are:
hostname : which should be master, slave or edge with a 1-up number like master01 and slave01
ansible_host : should be the AWS public IP address of the instances
ansible_user : should be the username you SSH into the instance using the private key.
You can easily get the public IP address of your instances from the AWS console. Here is what mine looks like:
If you followed the part 1 tutorial, then the username for your instances should be centos . Edit your inventory/static . You should have something similar to the following:
[master-nodes]
master01 ansible_host=#.#.#.# ansible_user=centos
master03 ansible_host=#.#.#.# ansible_user=centos
master03 ansible_host=5#.#.#.# ansible_user=centos
[slave-nodes]
slave01 ansible_host=#.#.#.# ansible_user=centos
slave02 ansible_host=#.#.#.# ansible_user=centos
slave03 ansible_host=#.#.#.# ansible_user=centos
#[edge-nodes]
Your public IP addresses will be different. Also note the #[edge-nodes] value in the file. Because we are not using any edge nodes, we should comment that host group line in the file.
Once you have all of your edits in place, save the file.
Configure ansible-hadoop/ansible.cfg
There are a couple of changes we need to make to the ansible.cfg file. This file provides overall configuration settings for Ansible. The default file in the playbook should look similar to the following:
[defaults]
host_key_checking = False
timeout = 60
ansible_keep_remote_files = True
library = playbooks/library/cloudera
We need to change the library line to be library = playbooks/library/site_facts . We will be deploying HDP which requires the site_facts module. We also need to tell Ansible where to find the private key file for connecting to the instances.
Edit the ansible.cfg file. You should modify the file to be similar to the following:
[defaults]
host_key_checking = False
timeout = 60
ansible_keep_remote_files = True
library = playbooks/library/site_facts
private_key_file=/Users/myoung/Development/ansible-hadoop/ansible.pem
Note the path of your private_key_file will be different. Once you have all of your edits in place, save the file.
Configure ansible-hadoop/group_vars/hortonworks
This step is optional. The group_vars/hortonworks file allows you to change how HDP is deployed. You can modify the version of HDP and Ambari. You can modify which components are installed. You can also specify custom repos and Ambari blueprints.
I will be using the default file, so there are no changes made.
Run bootstrap_static.sh
Before installing HDP, we need to ensure our OS configuration on the AWS instances meet the installation prerequisites. This includes things like ensuring DNS and NTP are working and all of the OS packages are updated. These are tasks that you often find people doing manually. This would obviously be tedious across 100s or 1000s of nodes. It would also introduce a far greater number of opportunties for human error. Ansible makes it incredibly easy to perform these kinds of tasks.
Running the bootstrap process is as easy as bash bootstrap_static.sh . This script essentially runs ansible-playbook -i inventory/static playbooks/boostrap.yml for you. This process will typically take 7-10 minutes depending on the size of the instances you selected.
When the script is finished, you should see something similar to the following;
PLAY RECAP *********************************************************************
localhost : ok=3 changed=2 unreachable=0 failed=0
master01 : ok=21 changed=15 unreachable=0 failed=0
master03 : ok=21 changed=15 unreachable=0 failed=0
slave01 : ok=21 changed=15 unreachable=0 failed=0
slave02 : ok=21 changed=15 unreachable=0 failed=0
slave03 : ok=21 changed=15 unreachable=0 failed=0
As you can see, all of the nodes had 21 total task performed. Of those tasks, 15 tasks required modifications to be compliant with the desire configuration state.
Run hortonworks_static.sh
Now that the bootstrap process is complete, we can install HDP. The hortonworks_static.sh script is all you have to run to install HDP. This script essentially runs ansible-playbook -i inventory/static playbooks/hortonworks.yml for you. The script installs the Ambari Server on the last master node in our list. In my case, the last master node is master03. The script also installs the Ambari Agent on all of the nodes. The installation of HDP is performed by submitting an request to the Ambari Server API using an Ambari Blueprint.
This process will typically take 10-15 minutes depending on the size of the instances you selected, the number of master nodes and the list of HDP components you have enabled.
If you forgot to install the specific version of Ansible, you will likely see something similar to the following:
TASK [site facts processing] ***************************************************
fatal: [localhost]: FAILED! => {"failed": true, "msg": "ERROR! The module sitefacts.py dnmemory=\"31.0126953125\" mnmemory=\"31.0126953125\" cores=\"8\" was not found in configured module paths. Additionally, core modules are missing. If this is a checkout, run 'git submodule update --init --recursive' to correct this problem."}
PLAY RECAP *********************************************************************
localhost : ok=4 changed=2 unreachable=0 failed=1
master01 : ok=8 changed=0 unreachable=0 failed=0
master03 : ok=8 changed=0 unreachable=0 failed=0
slave01 : ok=8 changed=0 unreachable=0 failed=0
slave02 : ok=8 changed=0 unreachable=0 failed=0
slave03 : ok=8 changed=0 unreachable=0 failed=0
To resolve this, simply perform the pip install ansible==2.1.3.0 command within your Python virtual environment. Now you can rerun the bash hortonworks_static.sh script.
The last task of the playbook is to install HDP via an Ambari Blueprint. It is normal to see something similar to the following:
TASK [ambari-server : Create the cluster instance] *****************************
ok: [master03]
TASK [ambari-server : Wait for the cluster to be built] ************************
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (180 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (179 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (178 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (177 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (176 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (175 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (174 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (173 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (172 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (171 retries left).
FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (170 retries left).
Once you see 3-5 of the retry messages, you can access the Ambari interface via your web browser. The default login is admin and the default password is admin . You should see something similar to the following:
Click on the Operations icon that shows 10 operations in progress. You should see something similar to the following:
The installation task each takes between 400-600 seconds. The start task each take between 20-300 seconds. The master servers typically take longer to install and star than the slave servers.
When everything is running properly, you should see something similar to this:
If you look back at your terminal window, you should see something similar to the following:
ok: [master03]
TASK [ambari-server : Fail if the cluster create task is in an error state] ****
skipping: [master03]
TASK [ambari-server : Change Ambari admin user password] ***********************
skipping: [master03]
TASK [Cleanup the temporary files] *********************************************
changed: [master03] => (item=/tmp/cluster_blueprint)
changed: [master03] => (item=/tmp/cluster_template)
changed: [master03] => (item=/tmp/alert_targets)
ok: [master03] => (item=/tmp/hdprepo)
PLAY RECAP *********************************************************************
localhost : ok=5 changed=3 unreachable=0 failed=0
master01 : ok=8 changed=0 unreachable=0 failed=0
master03 : ok=30 changed=8 unreachable=0 failed=0
slave01 : ok=8 changed=0 unreachable=0 failed=0
slave02 : ok=8 changed=0 unreachable=0 failed=0
slave03 : ok=8 changed=0 unreachable=0 failed=0
Destroy the cluster
You should remember that you will incur AWS costs while the cluster is running. You can either shutdown or terminate the instances. If you want to use the cluster later, then use Ambari to stop all of the services before shutting down the instances.
Review
If you successfully followed along with this tutorial, you should have been able to easy deploy Hortonworks Data Platform 2.5 on AWS using the Ansible playbook. The process to deploy the cluster typicall takes 10-20 minutes.
For more information on how the instance types and number of master nodes impacted the installation time, review the Ansbile + Hadoop slides I linked at the top of the article.
... View more
03-04-2017
06:05 PM
4 Kudos
Objective
This tutorial will walk you through the process of using Ansible, an agent-less automation tool, to create instances on AWS. The Ansible playbook we will use is relatively simple; you can use it as a base to experiment with more advanced features. You can read more about Ansible here: Ansible.
Ansible is written in Python and is installed as a Python module on the control host. The only requirement for the hosts managed by Ansible is the ability to login with SSH. There is no requirement to install any software on the host managed by Ansible.
If you have never used Ansible, you can become more familiar with it by going through some basic tutorials. The following two tutorials are a good starting point:
Automate All Things With Ansible: Part One
Automate All Things With Ansible: Part Two
This tutorial is part 1 of a 2 part series. Part 2 in the series will show you how to use Ansible to deploy Hortonworks Data Platform (HDP) on Amazon Web Services (AWS).
This tutorial was created as a companion to the Ansible + Hadoop talk I gave at the Ansible NOVA Meetup in February 2017. You can find the slides to that talk here: SlideShare
You can get a copy of the playbook from this tutorial here: Github
Prerequisites
You must have an existing AWS account.
You must have access to your AWS Access and Secret keys.
You are responsible for all AWS costs incurred.
Scope
This tutorial was tested using the following environment and components:
Mac OS X 10.11.6 and 10.12.3
Amazon Web Services
Anaconda 4.1.6 (Python 2.7.12)
Ansible 2.0.0.2 and 2.1.3.0
Steps
Create a project directory
You need to create a directory for your Ansible playbook. I prefer to create my project directories in ~/Development.
mkdir ~/Development/ansible-aws
cd ~/Development/ansible-aws
Install Ansible module
If you use the Anaconda version of Python, you already have access to Ansible. If you are not using Anaconda, then you can usually install Ansible using the following command:
pip install ansible
To read more about how to install Ansible: Ansible Installation
Overview of our Ansible playbook
Our playbook is relatively simple. It consists of a single inventory file, single group_vars file and a single playbook file. Here is the layout of the file and directory structure:
+- ansible-aws/
|
+- group_vars/
| +- all
|
+- inventory/
| +- hosts
|
+- playbooks/
| +- ansible-aws.yml
group_vars/all
You can use variables in your playbooks using the {{variable name}} syntax. These variables are populated based on values stored in your variable files. You can explicitly load variable files in your playbooks.
However, all playbooks will automatically load the variables in the group_vars/all variable file. The all variable file is loaded for all hosts regardless of the groups the host may be in. In our playbook, we are placing our AWS configuration values in the all file.
Edit the group_vars/all file. Copy and paste the following text into the file:
aws_access_key: <enter AWS access key>
aws_secret_key: <enter AWS secret key>
key_name: <enter private key file alias name>
aws_region: <enter AWS region>
vpc_id: <enter VPC ID>
ami_id: ami-6d1c2007
instance_type: m4.2xlarge
my_local_cidr_ip: <enter cidr_ip>
aws_access_key : You need to enter your AWS Access key
aws_secret_key : You need to enter your AWS Secret key
key_name : The alias name you gave to the AWS private key which you will use to SSH into the instances. In my case I created a key called ansible .
aws_region : The AWS region where you want to deploy your instances. In my case I am using us-east-1 .
vpc_id : The specific VPC in which you want to place your instances.
ami_id : The specific AMI you want to deploy for your instances. The ami-6d1c2007 AMI is a CentOS 7 image.
instance_type : The type of AWS instance. For deploying Hadoop, I recommend at least m4.2xlarge . A faster alternative is c4.4xlarge .
my_local_cidr_ip : Your local computer's CIDR IP address. This is used for creating the security rules that allow your local computer to access the instances. An example CIDR format is 192.168.1.1/32 . Make sure this set to your computer's public IP address.
After you have entered your appropriate settings, save the file.
inventory/hosts
Ansible requires a list of known hosts against which playbooks and tasks are run. We will tell Ansible to use a specific host file with the -i inventory/hosts parameter.
Edit the inventory/hosts file. Copy and paste the following text into the file:
[local]
localhost ansible_python_interpreter=/Users/myoung/anaconda/bin/python
[local] : Defines the group the host belongs to. You have the option for a playbook to run against all hosts, a specific group of hosts, or an individual host. This AWS playbook only runs on your local computer. That is because it uses the AWS APIs to communicate with AWS.
localhost : This is the hostname. You can list multiple hosts, 1 per line under each group heading. A host can belong to multiple groups.
ansible_python_interpreter : Optional entry that tells Ansible which specific version of Python to run. Because I am using Anaconda Python, I've included that setting here.
After you have entered your appropriate settings, save the file.
playbooks/ansible-aws.yml
The playbook is where we define the list of tasks we want to perform. Our playbook will consist of 2 tasks. The first task is to create a specific AWS Security Group. The second tasks is to create a specific configuration of 6 instances on AWS.
Edit the file playbooks/ansible-aws.yml . Copy and paste the following text into the file:
---
# Basic provisioning example
- name: Create AWS resources
hosts: localhost
connection: local
gather_facts: False
tasks:
- name: Create a security group
ec2_group:
name: ansible
description: "Ansible Security Group"
region: "{{aws_region}}"
vpc_id: "{{vpc_id}}""
aws_access_key: "{{aws_access_key}}"
aws_secret_key: "{{aws_secret_key}}"
rules:
- proto: all
cidr_ip: "{{my_local_cidr_ip}}"
- proto: all
group_name: ansible
rules_egress:
- proto: all
cidr_ip: 0.0.0.0/0
register: firewall
- name: Create an EC2 instance
ec2:
aws_access_key: "{{aws_access_key}}"
aws_secret_key: "{{aws_secret_key}}"
key_name: "{{key_name}}"
region: "{{aws_region}}"
group_id: "{{firewall.group_id}}"
instance_type: "{{instance_type}}"
image: "{{ami_id}}"
wait: yes
volumes:
- device_name: /dev/sda1
volume_type: gp2
volume_size: 100
delete_on_termination: true
exact_count: 6
count_tag:
Name: aws-demo
instance_tags:
Name: aws-demo
register: ec2
This playbook uses the Ansible ec2 and ec2_group modules. You can read more about the options available to those modules here:
ec2
ec2_group
The task to create the EC2 security group creates a group named ansible . It defines 2 ingress rules and 1 egress rule for that security group. The first ingress rule is to allow all inbound traffic from any host in the security group ansible . The second ingress rule is to allow all inbound traffic from your local computer IP address. The egress rule allows all traffic out from all of the hosts.
The task to create the EC2 instances creates 6 hosts because of the exact_count setting. It creates a tag called hadoop-demo on each of the instances and uses that tag to determine how many hosts exists. You can chose to use smaller number of hosts.
You can specify volumes to mount on each of the instances. The default volume size is 8 GB and is too small for deploying Hadoop later. I recommend setting the size to at least 100 GB as above. I also recommend you set delete_on_termination to true . This will tell AWS to delete the storage after you have deleted the instances. If you do not do this, then storage will be kept and you will be charged for it.
After you have entered your appropriate settings, save the file.
Running the Ansible playbook
Now that our 3 files have been created and saved with the appropriate settings, we can run the playbook. To run the playbook, you use the ansible-playbook -i inventory/hosts playbooks/ansible-aws.yml command. You should see something similar to the following:
$ ansible-playbook -i inventory/hosts playbooks/ansible-aws.yml
PLAY [Create AWS resources] ****************************************************
TASK [Create a security group] *************************************************
changed: [localhost]
TASK [Create an EC2 instance] **************************************************
changed: [localhost]
PLAY RECAP *********************************************************************
localhost : ok=2 changed=2 unreachable=0 failed=0
The changed lines indicate that Ansible found a configuration that needed to be modify to be consistent with our requested state. For the security group task, you would see this if your security group didn't exist or if you had a different set of ingress or egress rules. For the instance tasks, you would see this if there were less than or more than 6 hosts tagged as aws-demo .
Check AWS console.
If you check your AWS console, you should be able to confirm the instances are created. You should see something similar to the following:
Review
If you successfully followed along with this tutorial, you have created a simple Ansible playbook with 2 tasks using the ec2 and ec2_group Ansible modules. The playbook creates an AWS security group and instances which can be used later for deploying HDP on AWS.
... View more
02-28-2017
04:40 AM
3 Kudos
Objective
This tutorial is designed to walk you through the process of creating a MiniFi flow to read data from a Sense HAT sensor on a Raspberry Pi 3. The MiniFi flow will push data to a remote NiFi instance running on your computer. The NiFi instance will push the data to Solr.
While there are other tutorials and examples of using NiFi/MiniFi with a Raspberry Pi, most of those tutorials tend to use a more complicated sensor implementation. The Sense HAT is
very easy to install and use. Prerequisites
You should have a Raspberry Pi 3 Model B: Raspberry Pi 3 Model B I recommend a 16+GB SD card for your Raspberry Pi 3. Don't forget to expand the filesystem after the OS is installed: raspi-config
You should have a Sense HAT: Sense HAT You should already have installed the Sense HAT on your Raspberry Pi 3.
You should already have installed Raspbian Jessie Lite on your Raspberry Pi 3 SD card: Raspbian Jessie Lite The instructions for installing a Raspberry Pi OS can be found here: Raspberry PI OS Install You may be able to use the NOOBS operating system that typically ships with the Raspbery Pi. However, the Raspbian Lite OS will ensure the most system resources available to MiniFi or NiFi.
You should have enabled SSH on your Raspberry Pi: Enable SSH
You should have enabled WiFi on your Raspberry Pi (or use wired networking): Setup WiFi
You should have NiFi 1.x installed and working on your computer: NiFi
You should have the Java MiniFi Toolkit 0.1.0 installed and working on your computer: MiniFi ToolKit
You should have downloaded Solr 6.x on your computer: Solr Download Scope
This tutorial was tested using the following environment and components:
Mac OS X 10.11.6 and 10.12.3
MiniFi 1.0.2.1.1.0-2.1
MiniFi Toolkit 0.1.0
NiFi 1.1.1
Solr 6.4.1
Java JDK 1.8 Steps Connect to Raspberry Pi using SSH
If you have completed all of the prerequisites, then you should be able to easily SSH into your Raspberry Pi. On my Mac, I connect using:
ssh pi@raspberrypi
The default username is
pi and the password is raspberry .
If you get an unknown host or DNS error, then you need to specify the IP address of the Raspberry Pi. You can get that by logging directly into the Raspberry Pi console.
Now run the
ifconfig command.
You should see something similar to the following:
pi@raspberrypi:~ $ ifconfig
eth0 Link encap:Ethernet HWaddr b8:27:eb:60:ff:5b
inet6 addr: fe80::ec95:e79b:3679:5159/64 Scope:Link
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
wlan0 Link encap:Ethernet HWaddr b8:27:eb:35:aa:0e
inet addr:192.168.1.204 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::21f6:bf0f:5f9f:d60d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:17280 errors:0 dropped:11506 overruns:0 frame:0
TX packets:872 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3414755 (3.2 MiB) TX bytes:133472 (130.3 KiB)
If you are using WiFi, then look at the
wlan0 device. If you are using wired ethernet, then look at the eth0 device. Now you can connect using the ip address you found.
ssh pi@192.168.1.204 .
Your IP address will vary. Update Raspberry Pi packages
It's always a good idea to ensure your installed packages are up to date. Raspbian Lite is based on Debian. Therefore you need use
apt-get to update and install packages.
First, we need to run
sudo apt-get update to update the list of available packages and versions. You should see something similar to the following:
pi@raspberrypi:~ $ sudo apt-get update
Get:1 http://mirrordirector.raspbian.org jessie InRelease [14.9 kB]
Get:2 http://archive.raspberrypi.org jessie InRelease [22.9 kB]
Get:3 http://mirrordirector.raspbian.org jessie/main armhf Packages [8,981 kB]
Get:4 http://archive.raspberrypi.org jessie/main armhf Packages [145 kB]
Get:5 http://archive.raspberrypi.org jessie/ui armhf Packages [57.6 kB]
Get:6 http://mirrordirector.raspbian.org jessie/contrib armhf Packages [37.5 kB]
Get:7 http://mirrordirector.raspbian.org jessie/non-free armhf Packages [70.3 kB]
Get:8 http://mirrordirector.raspbian.org jessie/rpi armhf Packages [1,356 B]
Ign http://archive.raspberrypi.org jessie/main Translation-en_US
Ign http://archive.raspberrypi.org jessie/main Translation-en
Ign http://archive.raspberrypi.org jessie/ui Translation-en_US
Ign http://archive.raspberrypi.org jessie/ui Translation-en
Ign http://mirrordirector.raspbian.org jessie/contrib Translation-en_US
Ign http://mirrordirector.raspbian.org jessie/contrib Translation-en
Ign http://mirrordirector.raspbian.org jessie/main Translation-en_US
Ign http://mirrordirector.raspbian.org jessie/main Translation-en
Ign http://mirrordirector.raspbian.org jessie/non-free Translation-en_US
Ign http://mirrordirector.raspbian.org jessie/non-free Translation-en
Ign http://mirrordirector.raspbian.org jessie/rpi Translation-en_US
Ign http://mirrordirector.raspbian.org jessie/rpi Translation-en
Fetched 9,330 kB in 17s (542 kB/s)
Reading package lists... Done
Now we can update our installed packages using
sudo apt-get dist-upgrade . You should see something similar to the following:
pi@raspberrypi:~ $ sudo apt-get dist-upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
bind9-host libbind9-90 libdns-export100 libdns100 libevent-2.0-5 libirs-export91 libisc-export95 libisc95 libisccc90
libisccfg-export90 libisccfg90 libjasper1 liblwres90 libpam-modules libpam-modules-bin libpam-runtime libpam0g login
passwd pi-bluetooth raspberrypi-sys-mods raspi-config vim-common vim-tiny
24 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 4,767 kB of archives.
After this operation, 723 kB disk space will be freed.
Do you want to continue? [Y/n] y
The list of packages and versions that need to be updated will vary. Enter
y to update the installed packages. Install additional Raspberry Pi packages
We need to install additional packages to interact with the Sense HAT sensor and run MiniFi.
You access the Sense HAT libraries using Python. Therefore the first package we need to install is Python.
sudo apt-get install python
The second package we need to install is the libraries for the Sense HAT device.
sudo apt-get install sense-hat
We will be using the Java version of MiniFi. Therefore the third package we need to install is the Oracle JDK 8.
sudo apt-get install oracle-java8-jdk Verify Sense HAT functionality
Before we use MiniFi to collect any data, we need to ensure we can interact with the Sense HAT sensor. We will create a simple Python script to display a message on our Sense HAT.
Edit the file display_message.py using
vi display_message.py . Now copy and paste the following text into your text editor (remember to go into insert mode first):
from sense_hat import SenseHat
sense = SenseHat()
sense.show_message("Hello")
Save the script using
:wq! . Run this script using python display_message.py . You should see the word Hello scroll across the display of the Sense HAT in white text.
Now let's test reading the temperature from the Sense Hat. Edit the file get_temp.py using
vi get_temp.py . Now copy and paste the following text into your text editor (remember to go into insert mode first):
from sense_hat import SenseHat
sense = SenseHat()
t = sense.get_temperature()
print('Temperature = {0:0.2f} C'.format(t))
Save the script using
:wq! . Run the script using python get_temp.py . You should something similar to the following (your values will vary):
pi@raspberrypi:~ $ python get_temp.py
Temperature = 31.58 C
For our MiniFi use case, we will be looking at temperature, pressure, and humidity data. We will not use the Sense HAT display for MiniFi, so we'll only print the data to the console.
You can read more about the Sense HAT functions here:
Sense HAT API
Now let's create a script which prints all 3 sensor values. Edit the file get_environment.py using
vi get_environment.py . Copy and paste the following text into your text editor (remember to go into insert mode first):
from sense_hat import SenseHat
import datetime
sense = SenseHat()
t = sense.get_temperature()
p = sense.get_pressure()
h = sense.get_humidity()
print('Hostname = rapsberrypi')
print('DateTime = ' + datetime.datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ"))
print('Temperature = {0:0.2f} C'.format(t))
print('Pressure = {0:0.2f} Millibars'.format(p))
print('Humidity = {0:0.2f} %rH'.format(h))
Save the script using
:wq! . Run the script using python get_environment.py . You should something similar to the following (your values will vary):
Hostname = rapsberrypi
DateTime = 2017-02-27T21:20:55Z
Temperature = 32.90 C
Pressure = 1026.53 Millibars
Humidity = 25.36 %rH
As you can see from the script, we are printing our date output using UTC time via the
utcnow() function. We also need to ensure the data format is consumable by Solr. That is why we are using %Y-%m-%dT%H:%M:%SZ which is a format Solr can parse.
Our MiniFi flow will use the
ExecuteProcess to run the script. So we need to create a simple bash script to run the get_environment.py file. Edit the file get_environment.sh using vi get_environment.sh . Copy and paste the following text into your text editor (remember to go into insert mode first):
python /home/pi/get_environment.py
Save the script using
:wq! . Make sure the script is executable by running chmod 755 get_environment.sh . Let's make sure the bash script works ok. Run the script using ./get_environment.sh . You should something similar to the following (your values will vary):
Hostname = rapsberrypi
DateTime = 2017-02-27T21:20:55Z
Temperature = 32.90 C
Pressure = 1026.53 Millibars
Humidity = 25.36 %rH
Install MiniFi
We are going to install MiniFi on the Raspberry Pi. First download the the MiniFi release.
wget http://public-repo-1.hortonworks.com/HDF/2.1.1.0/minifi-1.0.2.1.1.0-2-bin.tar.gz Now you can extract it using tar xvfz minifi-1.0.2.1.1.0-2-bin.tar.gz .
Now we are ready to create our NiFi and MiniFi flows. Start NiFi
On your computer (not on the Raspberry Pi), start NiFi if you have not already done so. You do this by running
<nifi installation dir>/bin/nifi.sh start . It may take a few minutes before NiFi is fully started. You can monitor the logs by running tail -f <nifi installation dir>/log/nifi.app.log .
You should see something similar to the following when the UI is ready:
2017-02-26 14:10:01,199 INFO [main] org.eclipse.jetty.server.Server Started @40057ms
2017-02-26 14:10:01,695 INFO [main] org.apache.nifi.web.server.JettyServer NiFi has started. The UI is available at the following URLs:
2017-02-26 14:10:01,695 INFO [main] org.apache.nifi.web.server.JettyServer http://127.0.0.1:9091/nifi
2017-02-26 14:10:01,695 INFO [main] org.apache.nifi.web.server.JettyServer http://192.168.1.186:9091/nifi
2017-02-26 14:10:01,697 INFO [main] org.apache.nifi.BootstrapListener Successfully initiated communication with Bootstrap
2017-02-26 14:10:01,697 INFO [main] org.apache.nifi.NiFi Controller initialization took 11161419754 nanoseconds.
Now you should be able to access NiFi in your browser by going to
<hostname>:8080/nifi . The default port is 8080 . If you have a port conflict, you can change the port.
You should see a blank NiFi canvas similar to the following:
NiFi Blank Canvas
Setup Solr
Before we start on our NiFi flow, let's make sure Solr is running. We are going to use schemaless mode. You can easily start Solr using
solr -e schemaless .
You should see something similar to the following:
$ bin/solr -e schemaless
Creating Solr home directory /Users/myoung/Downloads/solr-6.4.1/example/schemaless/solr
Starting up Solr on port 8983 using command:
bin/solr start -p 8983 -s "example/schemaless/solr"
Waiting up to 180 seconds to see Solr running on port 8983 [\]
Started Solr server on port 8983 (pid=49659). Happy searching!
Copying configuration to new core instance directory:
/Users/myoung/Downloads/solr-6.4.1/example/schemaless/solr/gettingstarted
Creating new core 'gettingstarted' using command:
http://localhost:8983/solr/admin/cores?action=CREATE&name=gettingstarted&instanceDir=gettingstarted
{
"responseHeader":{
"status":0,
"QTime":1371},
"core":"gettingstarted"}
Solr schemaless example launched successfully. Direct your Web browser to http://localhost:8983/solr to visit the Solr Admin UI
As you can see, Solr created a collection called
gettingstarted . That is the name of the collection our NiFi PutSolrContentStream will use. Create NiFi flow
Now we need to create our NiFi flow that will receive data from MiniFi. Input Port
The MiniFi flow will send data to a
Remote Process Group . The Remote Process Group requires an Input Port . From the NiFi menu, drag the Input Port icon to the canvas.
In the
Add Port dialog that is displayed, type a name for your port. I used From Raspberry Pi . You should see something similar to the following:
Click the blue
ADD button. ExtractText
From the NiFi menu, drag the
Processor icon to the canvas. In the Filter box, enter extract . You should see something similar to the following:
Select the
ExtractText processor. Click on the blue ADD button to add the processor to the canvas.
Now we need to configure the
ExtractText processor. Right click on the processor and select the Configure menu option.
On the
SETTINGS tab of the ExtractText processor, you should check the unmatched box under Automatically Terminate Relationships . This will drop any records which we fail to extract text from. You should see something similar to the following:
On the
PROPERTIES tab of the ExtractText processor, there are a few changes we need to make.
First, we want want to set
Enable Multiline Mode to true . This allows the Regular Expressions to match across multiple lines. This is important because our data is coming in as multiline data.
Second, we want to set
Include Capture Group 0 to false . Each Regular Expression we are using has only a single group. If we left this value to true, each field we extract would have duplicate values which would go unused as <attribute name>.0 .
Third, we need to add additional fields to the processor which allows us to define our Regular Expressions. If you click the
+ icon in the upper right corner of the dialog, you should see something similar to the following:
We are going to add a property called
hostname . This will hold the value from the line Hostname = in the data. Click the blue OK button. Now you should see another dialog where you enter the regular expression. You should see something similar to the following:
Enter the following Regular Expression:
Hostname = (\w+)
We need to repeat this process for each of the other data elements coming from the Raspberry Pi. You should have the following extra fields defined as separate fields:
property: hostname
value: Hostnamne = (\w+)
property: datetime
value: DateTime = (\d{4}\-\d{2}\-\d{2}T\d{2}\:\d{2}:\d{2}Z)
property: temperature
value: Temperature = (\d+\.\d+) C
property: humidity
value: Humidity = (\d+\.\d+) %rH
property: pressure
value: Pressure = (\d+\.\d+) Millibars
When you have entered each of these properties, you should see something similar to the following:
Click the blue
APPLY button to save the changes. AttributesToJSON
From the NiFi menu, drag the
Processor icon to the canvas. In the Filter box, enter attributes . You should see something similar to the following:
Select the
AttributesToJSON processor. Click on the blue ADD button to add the processor to the canvas.
Now we need to configure the
AttributesToJSON processor. Right click on the processor and select the Configure menu option.
On the
PROPERTIES tab of the AttributesToJSON processor, there are a few changes we need to make.
For the
Attributes List property, we need to provide a comma-separated list of attributes we want the processor to pass on. Click inside the Value box next to Attributes List . Enter the following value:
hostname,datetime,temperature,pressure,humidity
For the
Destination property, set the value to flowfile-content . We need the values to be in the flowfile content itself as JSON which is needed by the PutSolrContentStream processor. Otherwise the flowfile content will contain the raw data (not JSON) coming from the Raspberry Pi. This will cause Solr to throw errors because it is not able to parse request.
You should see something similar to the following:
Click the blue
APPLY button to save the changes. PutSolrContentStream
From the NiFi menu, drag the
Processor icon to the canvas. In the Filter box, enter solr . You should see something similar to the following:
Select the
PutSolrContentStream processor. Click on the blue ADD button to add the processor to the canvas.
Now we need to configure the
PutSolrContentStream processor. Right click on the processor and select the Configure menu option.
On the
SETTINGS tab of the PutSolrContentStream processor, you should check the connection_failure , failure , and success boxes under Automatically Terminate Relationships . Since this is the end of the flow, we can terminate everything. You could expand on this by retrying failures, or logging errors to a text file.
You should see something similar to the following:
On the
PROPERTIES tab of the PutSolrContentStream processor, we need to make a few changes.
Set the
Solr Type property to Standard . We don't need to run SolrCloud for our demo.
Set the
Solr Location to http://192.168.1.186:8983/solr/gettingstarted . You should use the IP address of your computer. When we start Solr up, we'll be using the gettingstarted collection, so it's part of the URL. If we were using SolrCloud, we put put the collection name in the Collection property instead.
The first set of properties should look similar to the following:
Now we need to add fields for indexing in Solr. Click the
+ icon in the upper right corner of the processor. The Add Property dialog will be displayed. For the first field, enter f.1 and click the ADD button. For the value enter hostname_s:/hostname . The hostname_s part of the value says to store the content in the Solr field called hostname_s , which uses the dynamic schema to treat this field as a string. The /hostname part of the value says to pull the value from the root of the JSON where the JSON node is called hostname .
We need to repeat this process for each of the other data elements coming from the Raspberry Pi. You should have the following fields defined as separate fields:
property: f.1
value: hostname_s:/hostname
property: f.2
value: timestamp_dts:/datetime
property: f.3
value: temperature_f:/temperature
property: f.4
value: pressure_f:/pressure
property: f.5
value: humidity_f:/humidity
Click the blue
APPLY button to save the changes. Connector Processors
Now that we have our processors on the canvas, we need to connect them. Drag the connection icon from the
Input Port processor to the ExtractText processor.
Drag the connection icon from the
ExtractText processor to the AttributesToJSON processor.
Drag the connection icon from the
AttributesToJSON processor to the PutSolrContentStream processor.
You should have something that looks similar to the following:
Create MiniFi flow
Now we can create our MiniFi flow. ExecuteProcess
The first thing we need to do is add a processor to execute the bash script we created on the Raspberry Pi.
Drag the
Processor icon to the canvas. Enter execute in the Filter box. You should see something similar to the following:
Select the
ExecuteProcess processor. Click on the blue ADD button to add the processor to the canvas.
Now we need to configure the
ExecuteProcess processor. Right click on the processor and select the Configure menu option.
On the
SETTINGS tab you should check the success box under Automatically Terminate Relationships . You should see something similar to the following:
On the
Scheduling tab we want to set the Run Schedule to 5 sec . This will run the processor every 5 seconds. You should see something similar to the following;
On the
Properties tab we want to set the Command to /home/pi/get_environment.sh . This assumes you created the scripts in the /home/pi directory on the Raspberry Pi.
Click the blue
APPLY button to save the changes. Remote Process Group
Now we need to add a
Remote Process Group to our canvas. This is how the MiniFi flow is able to send data to Nifi. Drag the Remote Process Group icon to the canvas.
For the
URL enter the URL you use to access your NiFi UI. In my case that is http://192.168.1.186:9090/nifi . Remember the default port for NiFi is 8080 . For the Transport Protocol select HTTP . You can leave the other settings as defaults. You should see something similar to the following:
Click the blue
ADD button to add the Remote Process Group to the canvas. Create Connection
Now we need to create a connection between our ExecuteProcess processor and our Remote Process Group on the canvas.
Hover your mouse over the
ExecuteProcess processor. Click on the circle arrow icon and drag from the processor to the Remote Process Group . Save Template
We need to save the MiniFi portion of the flow as a template. Select the
ExecuteProcess , Remote Process Group and the connection between them using the shift key to allow multi-select.
Click on the
Create Template icon (second icon from the right on the top row) in the Operate Box on the canvas. It looks like the following:
The
Create Template dialog will be displayed. Give your template a name. I used rasbperrypi and click the blue CREATE button.
Now click on the main Nifi Menu button in the upper right corner of the UI. You should see something like the following:
Now click the
Templates options. This will open the NiFi Templates dialog. You will see a list of templates you have created. You should see something similar to the following:
Now find the template you just created and click on the
Download button on the right hand side. This will save a copy of the flowfile in xml format on your local computer. Convert NiFi Flow to MiniFi Flow
We need to convert the xml flowfile NiFi generated into a yml file that MiniFi uses. We will be using the minifi-toolkit to do this.
We need run the minifi-toolkit transform command. The first option is the location of the NiFi flowfile you downloaded. The second option is the location where to write out the MiniFi flowfile. MiniFi expects the flowfile name to be
config.yml
Run the transform command. You should see something similar to the following:
$ /Users/myoung/Downloads/minifi-toolkit-0.1.0/bin/config.sh transform ~/Downloads/raspberry.xml ~/Downloads/config.yml
Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home
MiNiFi Toolkit home: /Users/myoung/Downloads/minifi-toolkit-0.1.0
No validation errors found in converted configuration.
Copy MiniFi Flow to Raspberry Pi
Now we need to copy the flowfile to the Raspberry pi. You can easily do that using the
scp command. The config.yml file we generated needs to go in the /home/pi/minifi-1.0.2.1.1.0-2/conf/ directory.
You should see something similar to the following:
$ scp ~/Downloads/minifi.yml pi@raspberrypi:/home/pi/minifi-1.0.2.1.1.0-2/conf/config.yml
pi@raspberrypi's password:
minifi.yml 100% 1962 721.1KB/s 00:00
Start MiniFi
Now that the flowfile is in place, we can start MiniFi. You do that using the
minifi.sh script with the start option. Remember that MiniFi will be running on the Raspberry Pi, not on your computer.
You should see something similar to the following:
$ /home/pi/minifi-1.0.2.1.1.0-2/minifi.sh start
minifi.sh: JAVA_HOME not set; results may vary
Bootstrap Classpath: /home/pi/minifi-1.0.2.1.1.0-2/conf:/home/pi/minifi-1.0.2.1.1.0-2/lib/bootstrap/*:/home/pi/minifi-1.0.2.1.1.0-2/lib/*
Java home:
MiNiFi home: /home/pi/minifi-1.0.2.1.1.0-2
Bootstrap Config File: /home/pi/minifi-1.0.2.1.1.0-2/conf/bootstrap.conf
Now MiniFi should be running on your Raspberry Pi. If you run into any issues, look at the logs in <minifi directory>/logs/minifi-app.log. Start NiFi flow
Now that everything else is in place, we should be able to start our NiFi flow. Start the 4 NiFi processors, not the two MiniFi parts of the flow. If everything is working properly, you should start seeing records in Solr. Dashboard
You can easily add Banana to Solr to create a dashboard. Here is an example: Review
If you successfully followed along with this tutorial, you should have MiniFi collecting data from your Sense HAT sensor on your Raspberry Pi. The MiniFi flow should be sending that data to NiFi on your computer which then sends to the data to Solr.
... View more
Labels:
12-07-2016
01:26 AM
3 Kudos
Objective If you are managing multiple copies of the HDP sandbox for Docker (see my article here:How to manage multiple copies of the HDP Docker Sandbox.), you may find yourself running out of storage within your Docker VM image on your laptop. There is a way to increase the available storage space of the Docker VM image. Increasing the storage space will allow you to have more copies of sandbox containers in addition to other images and containers. This tutorial will guide you through the process of increasing the size of the base Docker for Mac VM image. We will increase the size from 64GB to 120GB. This tutorial is the second in a two part series. The first tutorial in the series is: How to move Docker for Mac vm image from internal to external hard drive. For a tutorial on increasing the base Docker VM image on CentOS 7, read my tutorial here: How to modify the default Docker configuration on CentOS 7 to import HDP sandbox Prerequisites You should have already completed the following tutorial Installing Docker Version of Sandbox on Mac You should have already completed the following tutorial How to move Docker for Mac vm image from internal to external hard drive You should have already installed Homebrew Homebrew You should have an external hard drive available. Scope Mac OS X 10.11.6 (El Capitan) Docker for Mac 1.12.1 HDP 2.5 Docker Sandbox Homebrew 1.1.0 Steps Install qemu The Docker virtual machine image is a qcow2 format, which requires qemu to manage. Before we can manipulate our image file, we need to install qemu.
brew install qemu brew install qemu
==> Installing dependencies for qemu: jpeg, libpng, libtasn1, gmp, nettle, gnutls, gettext, libffi, pcre, glib, pixman
==> Installing qemu dependency: jpeg
==> Downloading https://homebrew.bintray.com/bottles/jpeg-8d.el_capitan.bottle.2.tar.gz
######################################################################## 100.0%
==> Pouring jpeg-8d.el_capitan.bottle.2.tar.gz
/usr/local/Cellar/jpeg/8d: 19 files, 713.8K
==> Installing qemu dependency: libpng
==> Downloading https://homebrew.bintray.com/bottles/libpng-1.6.26.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring libpng-1.6.26.el_capitan.bottle.tar.gz
/usr/local/Cellar/libpng/1.6.26: 26 files, 1.2M
==> Installing qemu dependency: libtasn1
==> Downloading https://homebrew.bintray.com/bottles/libtasn1-4.9.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring libtasn1-4.9.el_capitan.bottle.tar.gz
/usr/local/Cellar/libtasn1/4.9: 58 files, 437K
==> Installing qemu dependency: gmp
==> Downloading https://homebrew.bintray.com/bottles/gmp-6.1.1.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring gmp-6.1.1.el_capitan.bottle.tar.gz
/usr/local/Cellar/gmp/6.1.1: 17 files, 3.2M
==> Installing qemu dependency: nettle
==> Downloading https://homebrew.bintray.com/bottles/nettle-3.3.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring nettle-3.3.el_capitan.bottle.tar.gz
/usr/local/Cellar/nettle/3.3: 81 files, 2.0M
==> Installing qemu dependency: gnutls
==> Downloading https://homebrew.bintray.com/bottles/gnutls-3.4.16.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring gnutls-3.4.16.el_capitan.bottle.tar.gz
==> Using the sandbox
/usr/local/Cellar/gnutls/3.4.16: 1,115 files, 6.9M
==> Installing qemu dependency: gettext
==> Downloading https://homebrew.bintray.com/bottles/gettext-0.19.8.1.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring gettext-0.19.8.1.el_capitan.bottle.tar.gz
==> Caveats
This formula is keg-only, which means it was not symlinked into /usr/local.
macOS provides the BSD gettext library and some software gets confused if both are in the library path.
Generally there are no consequences of this for you. If you build your
own software and it requires this formula, you'll need to add to your
build variables:
LDFLAGS: -L/usr/local/opt/gettext/lib
CPPFLAGS: -I/usr/local/opt/gettext/include
==> Summary
/usr/local/Cellar/gettext/0.19.8.1: 1,934 files, 16.9M
==> Installing qemu dependency: libffi
==> Downloading https://homebrew.bintray.com/bottles/libffi-3.0.13.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring libffi-3.0.13.el_capitan.bottle.tar.gz
==> Caveats
This formula is keg-only, which means it was not symlinked into /usr/local.
Some formulae require a newer version of libffi.
Generally there are no consequences of this for you. If you build your
own software and it requires this formula, you'll need to add to your
build variables:
LDFLAGS: -L/usr/local/opt/libffi/lib
==> Summary
/usr/local/Cellar/libffi/3.0.13: 15 files, 374.7K
==> Installing qemu dependency: pcre
==> Downloading https://homebrew.bintray.com/bottles/pcre-8.39.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring pcre-8.39.el_capitan.bottle.tar.gz
/usr/local/Cellar/pcre/8.39: 203 files, 5.4M
==> Installing qemu dependency: glib
==> Downloading https://homebrew.bintray.com/bottles/glib-2.50.1.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring glib-2.50.1.el_capitan.bottle.tar.gz
/usr/local/Cellar/glib/2.50.1: 427 files, 22.3M
==> Installing qemu dependency: pixman
==> Downloading https://homebrew.bintray.com/bottles/pixman-0.34.0.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring pixman-0.34.0.el_capitan.bottle.tar.gz
/usr/local/Cellar/pixman/0.34.0: 12 files, 1.2M
==> Installing qemu
==> Downloading https://homebrew.bintray.com/bottles/qemu-2.7.0.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring qemu-2.7.0.el_capitan.bottle.tar.gz
/usr/local/Cellar/qemu/2.7.0: 126 files, 139.8M NOTE: This may take several minutes. Export Docker containers Before we make any changes to our Docker vm image, we should backup any containers we want to save. This is not a typical Docker use case as most Docker containers are emphemeral. However, we want to save any configuration changes we've made to our sandbox containers. To get a list of containers use the docker ps -a command. This will show all containers, running or not. You should see something similar to this:
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
15411dc968ad fc813bdc4bdd "/usr/sbin/sshd -D" 4 weeks ago Exited (0) 23 hours ago hdp25-atlas-demo As you can see I have a single container which I use for Atlas demos. If you followed my article above for managing multiple copies of the sandbox, you will note the name of the sandbox is based on my project directory. If you have not followed that tutorial, then the NAMES column will display sandbox which is the default container name. I want to save that container to avoid having to redo any configuration and setup tasks that I've already completed. Using the docker export command, we can create a saved image of our container. This image can be imported into Docker later. To read more about the docker export command look here: docker export. You should give the --output parameter a filename that makes sense. You should see something similar to this:
cd ~
docker export --output="hdp25-atlas-demo.tar" hdp25-atlas-demo NOTE: This may take several minutes. You can check the size of the container export. You should see something like this:
ls -lah hdp25-atlas-demo.tar
-rw------- 1 myoung staff 14G Nov 9 11:57 hdp25-atlas-demo.tar You shouldn't need to save your Docker images as they can easily be imported again and contain no custom configurations. Stop Docker for Mac Let check the storage size available with our current virtual machine image.
docker run --rm alpine df -h
Filesystem Size Used Available Use% Mounted on
none 59.0G 35.7G 20.3G 64% /
tmpfs 5.9G 0 5.9G 0% /dev
tmpfs 5.9G 0 5.9G 0% /sys/fs/cgroup
/dev/vda2 59.0G 35.7G 20.3G 64% /etc/resolv.conf
/dev/vda2 59.0G 35.7G 20.3G 64% /etc/hostname
/dev/vda2 59.0G 35.7G 20.3G 64% /etc/hosts
shm 64.0M 0 64.0M 0% /dev/shm
tmpfs 5.9G 0 5.9G 0% /proc/kcore
tmpfs 5.9G 0 5.9G 0% /proc/timer_list
tmpfs 5.9G 0 5.9G 0% /proc/sched_debug Notice that our / partition is 59GB in size. Another 5.9GB is used for the other partitions. That brings us up to our 64GB file size. Before we can make any changes to the Docker virtual machine image, we need to stop Docker for Mac. There should be a Docker for Mac icon in the menu bar. You should see something similar to this:
You can also check via the command line via the ps -ef | grep -i com.docker . You should see something similar to this:
ps -ef | grep -i com.docker
0 123 1 0 8:45AM ?? 0:00.01 /Library/PrivilegedHelperTools/com.docker.vmnetd
502 967 876 0 8:46AM ?? 0:00.08 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 969 967 0 8:46AM ?? 0:00.04 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 971 967 0 8:46AM ?? 0:07.96 com.docker.db --url fd:3 --git /Users/myoung/Library/Containers/com.docker.docker/Data/database
502 975 967 0 8:46AM ?? 0:03.40 com.docker.osx.hyperkit.linux
502 977 975 0 8:46AM ?? 0:00.03 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux
502 12807 967 0 9:17PM ?? 0:00.08 com.docker.osxfs --address fd:3 --connect /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --control fd:4 --volume-control fd:5 --database /Users/myoung/Library/Containers/com.docker.docker/Data/s40
502 12810 967 0 9:17PM ?? 0:00.12 com.docker.slirp --db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 --ethernet fd:3 --port fd:4 --vsock-path /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --max-connections 900
502 12811 967 0 9:17PM ?? 0:00.19 com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 12812 12811 0 9:17PM ?? 0:00.02 /Applications/Docker.app/Contents/MacOS/com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 12814 12811 0 9:17PM ?? 0:16.48 /Applications/Docker.app/Contents/MacOS/com.docker.hyperkit -A -m 12G -c 6 -u -s 0:0,hostbridge -s 31,lpc -s 2:0,virtio-vpnkit,uuid=1f629fed-1ef6-4f34-8fce-753347e3b941,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s50,macfile=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/mac.0 -s 3,virtio-blk,file:///Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2,format=qcow -s 4,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s40,tag=db -s 5,virtio-rnd -s 6,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s51,tag=port -s 7,virtio-sock,guest_cid=3,path=/Users/myoung/Library/Containers/com.docker.docker/Data,guest_forwards=2376;1525 -l com1,autopty=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty,log=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/console-ring -f kexec,/Applications/Docker.app/Contents/Resources/moby/vmlinuz64,/Applications/Docker.app/Contents/Resources/moby/initrd.img,earlyprintk=serial console=ttyS0 com.docker.driver="com.docker.driver.amd64-linux", com.docker.database="com.docker.driver.amd64-linux" ntp=gateway mobyplatform=mac -F /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/hypervisor.pid
502 13790 876 0 9:52PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 13791 13790 0 9:52PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 13793 13146 0 9:52PM ttys000 0:00.00 grep -i com.docker Now we will stop Docker for Mac. Using the menu shown above, click on the Quit Docker menu option. This will stop Docker for Mac. You should notice the Docker for Mac icon is no longer visible. Now let's confirm the Docker processes we saw before are no longer running.
ps -ef | grep -i com.docker
0 123 1 0 8:45AM ?? 0:00.01 /Library/PrivilegedHelperTools/com.docker.vmnetd
502 13815 13146 0 9:54PM ttys000 0:00.00 grep -i com.docker NOTE: It may take a few seconds before Docker for Mac is completely stopped. It is ok for the com.docker.vmnetd to still be running. Create new Docker template image Docker uses a template image file. That file is copied from /Application/Docker.app/Contents/Resources/moby/data.qcow2 to ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2 when Docker sees you are missing the Docker.qcow2 file. We are going to create a new template image file. In my case I want to create a file that is 120GB in size. You should see something similar to this:
cd ~
qemu-img create -f qcow2 ~/data.qcow2 120G
Formatting '/Users/myoung/data.qcow2', fmt=qcow2 size=128849018880 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 Now let's see how big the file is. You should see something similar to this:
ls -lah data.qcow2
-rw-r--r-- 1 myoung staff 194K Nov 9 19:30 data.qcow2 That is interesting; the file is only 194KB in size. Why? That is because the qcow2 image format is a sparse file. The image file grows as you add content to it, such as images and containers. You can read more here Sparse Files. The file will continue to grow until it reaches 120GB in size. NOTE: On Linux systems, this process is more involved than it is for the Mac. You have to change the Docker configuration to define a new setting --storage-opt=dm.basesize=30G, or whatever size is appropriate for your environment. There is a link to my Linux article on this topic at the top of the page. Backup default template image file We should backup our default template image file just to be safe.
mv /Application/Docker.app/Contents/Resources/moby/data.qcow2 /Application/Docker.app/Contents/Resources/moby/data.qcow2.backup Let's make sure the file exists.
ls -lah /Applications/Docker.app/Contents/Resources/moby/data.qcow2.backup
-rw-r--r-- 1 myoung admin 320K Nov 8 12:36 /Applications/Docker.app/Contents/Resources/moby/data.qcow2.backup Now we can cp our new template file into place.
cp data.qcow2 /Applications/Docker.app/Contents/Resources/moby/data.qcow2 Again, let's make sure the file exists.
ls -lah /Applications/Docker.app/Contents/Resources/moby/data.qcow2*
-rw-r--r-- 1 myoung admin 194K Nov 9 19:38 /Applications/Docker.app/Contents/Resources/moby/data.qcow2
-rw-r--r-- 1 myoung admin 320K Nov 8 12:36 /Applications/Docker.app/Contents/Resources/moby/data.qcow2.backup Delete current Docker vm image file Now we need delete the current Docker vm image file located at ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2 . We need to do this because we need to create a new image file using the template baseline image we just created. We backed up our containers in previous steps, so we shouldn't have to worry about losing anything. You can decide to backup this file just to be safe.
mv ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2 ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2.backup NOTE: You must have sufficient storage space to hold two copies of the Docker.qcow2 file. Now let's make sure the file has been moved.
ls -lah ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2*
-rw-r--r-- 1 myoung staff 64G Nov 9 19:30 /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2.backup Now restart Docker for Mac Now we can restart Docker for Mac. This is done by running the application from the Applications folder in the Finder. You should see something similar to this:
Double-click on the Docker application to start it. You should notice the Docker for Mac icon is now back in the menu menu bar. You can also check via ps -ef | grep -i com.docker . You should see something similar to this:
ps -ef | grep -i com.docker
0 123 1 0 8:45AM ?? 0:00.01 /Library/PrivilegedHelperTools/com.docker.vmnetd
502 14476 14465 0 10:42PM ?? 0:00.03 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 14479 14476 0 10:42PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux -watchdog fd:0
502 14480 14476 0 10:42PM ?? 0:00.29 com.docker.db --url fd:3 --git /Users/myoung/Library/Containers/com.docker.docker/Data/database
502 14481 14476 0 10:42PM ?? 0:00.08 com.docker.osxfs --address fd:3 --connect /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --control fd:4 --volume-control fd:5 --database /Users/myoung/Library/Containers/com.docker.docker/Data/s40
502 14482 14476 0 10:42PM ?? 0:00.04 com.docker.slirp --db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 --ethernet fd:3 --port fd:4 --vsock-path /Users/myoung/Library/Containers/com.docker.docker/Data/@connect --max-connections 900
502 14483 14476 0 10:42PM ?? 0:00.05 com.docker.osx.hyperkit.linux
502 14484 14476 0 10:42PM ?? 0:00.08 com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 14485 14483 0 10:42PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.osx.hyperkit.linux
502 14486 14484 0 10:42PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.driver.amd64-linux -db /Users/myoung/Library/Containers/com.docker.docker/Data/s40 -osxfs-volume /Users/myoung/Library/Containers/com.docker.docker/Data/s30 -slirp /Users/myoung/Library/Containers/com.docker.docker/Data/s50 -vmnet /var/tmp/com.docker.vmnetd.socket -port /Users/myoung/Library/Containers/com.docker.docker/Data/s51 -vsock /Users/myoung/Library/Containers/com.docker.docker/Data -docker /Users/myoung/Library/Containers/com.docker.docker/Data/s60 -addr fd:3 -debug
502 14488 14484 0 10:42PM ?? 0:07.90 /Applications/Docker.app/Contents/MacOS/com.docker.hyperkit -A -m 12G -c 6 -u -s 0:0,hostbridge -s 31,lpc -s 2:0,virtio-vpnkit,uuid=1f629fed-1ef6-4f34-8fce-753347e3b941,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s50,macfile=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/mac.0 -s 3,virtio-blk,file:///Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2,format=qcow -s 4,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s40,tag=db -s 5,virtio-rnd -s 6,virtio-9p,path=/Users/myoung/Library/Containers/com.docker.docker/Data/s51,tag=port -s 7,virtio-sock,guest_cid=3,path=/Users/myoung/Library/Containers/com.docker.docker/Data,guest_forwards=2376;1525 -l com1,autopty=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty,log=/Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/console-ring -f kexec,/Applications/Docker.app/Contents/Resources/moby/vmlinuz64,/Applications/Docker.app/Contents/Resources/moby/initrd.img,earlyprintk=serial console=ttyS0 com.docker.driver="com.docker.driver.amd64-linux", com.docker.database="com.docker.driver.amd64-linux" ntp=gateway mobyplatform=mac -F /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/hypervisor.pid
502 14559 14465 0 10:46PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 14560 14559 0 10:46PM ?? 0:00.01 /Applications/Docker.app/Contents/MacOS/com.docker.frontend {"action":"vmstateevent","args":{"vmstate":"running"}}
502 14562 13146 0 10:46PM ttys000 0:00.00 grep -i com.docker You should notice the Docker processes are running again. You can also check the timestamp of files in the Docker image directory:
ls -lah ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2*
-rw-r--r-- 1 myoung staff 179M Nov 9 19:46 /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2
-rw-r--r-- 1 myoung staff 64G Nov 9 19:45 /Users/myoung/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2.backup You should notice our backup file exists and is 64GB. The new file was created and is 179MB. As mentioned above, the new file will continue to grow as you use Docker. Let's check our available disk space in our Docker vm.
docker run --rm alpine df -h
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
3690ec4760f9: Pull complete
Digest: sha256:1354db23ff5478120c980eca1611a51c9f2b88b61f24283ee8200bf9a54f2e5c
Status: Downloaded newer image for alpine:latest
Filesystem Size Used Available Use% Mounted on
none 114.1G 66.7M 108.2G 0% /
tmpfs 5.9G 0 5.9G 0% /dev
tmpfs 5.9G 0 5.9G 0% /sys/fs/cgroup
/dev/vda2 114.1G 66.7M 108.2G 0% /etc/resolv.conf
/dev/vda2 114.1G 66.7M 108.2G 0% /etc/hostname
/dev/vda2 114.1G 66.7M 108.2G 0% /etc/hosts
shm 64.0M 0 64.0M 0% /dev/shm
tmpfs 5.9G 0 5.9G 0% /proc/kcore
tmpfs 5.9G 0 5.9G 0% /proc/timer_list
tmpfs 5.9G 0 5.9G 0% /proc/sched_debug Notice the image for alpine:latest didn't exist and it was downloaded. That is because we are using a new vm image. Also notice we have 114.1GB size of our / partition. And we have another 5.9GB in use for the other partitions. That brings our total space up to 120GB. Import Docker container Now that Docker for Mac is running, we need to import the saved container images we created from earlier. We will use the docker import command to do this. To read more about the docker import command look here: docker import.
cd ~
docker import hdp25-atlas-demo.tar atlas-demo:latest
sha256:c793528e25bd95564c8cbf4b1487c47107d0cc300b1929b8a0fdeff93ed84eae NOTE: This may take up to an hour for this first import. Let's look at our list of images now.
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
atlas-demo latest ad47bef4e526 2 minutes ago 13.93 GB
alpine latest baa5d63471ea 3 weeks ago 4.803 MB You should notice the repository and tag line up with what we provided to the import command. Create new sandbox container Our saved container was imported as an image. If your container was created using project directories based on this tutorial How to manage multiple copies of the HDP Docker Sandbox, then you will need to make a change before you can start the container. You need to modify the create-container.sh script. That script is creating a container based on an image called sandbox . You need to modify that script to instead create a container based on an image called atlas-demo . This assumes you called your repository atlas-demo when you imported the image. Here is a copy of my modified create-container.sh script:
docker run -v `pwd`:/mount -v ${PROJ_DIR}:/hadoop --name ${PROJ_DIR} --hostname "sandbox.hortonworks.com" --privileged -d -p 6080:6080 -p 9090:9090 -p 9000:9000 -p 8000:8000 -p 8020:8020 -p 42111:42111 -p 10500:10500 -p 16030:16030 -p 8042:8042 -p 8040:8040 -p 2100:2100 -p 4200:4200 -p 4040:4040 -p 8050:8050 -p 9996:9996 -p 9995:9995 -p 8080:8080 -p 8088:8088 -p 8886:8886 -p 8889:8889 -p 8443:8443 -p 8744:8744 -p 8888:8888 -p 8188:8188 -p 8983:8983 -p 1000:1000 -p 1100:1100 -p 11000:11000 -p 10001:10001 -p 15000:15000 -p 10000:10000 -p 8993:8993 -p 1988:1988 -p 5007:5007 -p 50070:50070 -p 19888:19888 -p 16010:16010 -p 50111:50111 -p 50075:50075 -p 50095:50095 -p 18080:18080 -p 60000:60000 -p 8090:8090 -p 8091:8091 -p 8005:8005 -p 8086:8086 -p 8082:8082 -p 60080:60080 -p 8765:8765 -p 5011:5011 -p 6001:6001 -p 6003:6003 -p 6008:6008 -p 1220:1220 -p 21000:21000 -p 6188:6188 -p 61888:61888 -p 2181:2181 -p 2222:22 atlas-demo /usr/sbin/sshd -D Notice the last line uses atlas-demo instead of sandbox . Once you run this command, your container will be created and running. You do not need to modify the start-container.sh , ssh-container.sh or stop-container.sh script. They all use the ${PROJ_DIR} which is one of the reasons these scripts are so handy. Connect to the sandbox Now that the container is started, we can connect to it. We can use our helper script ssh-container.sh to make it easy:
./ssh-container.sh When you created a new sandbox container before, you were prompted to reset the root password. You were not prompted this time, because our existing configuration is part the image now. Start the sandbox processes When the container starts up, it doesn't automatically start the sandbox processes. You can do that by running the /etc/inid./startup_script . You should see something similar to this:
[root@sandbox ~]# /etc/init.d/startup_script start
Starting tutorials... [ Ok ]
Starting startup_script...
Starting HDP ...
Starting mysql [ OK ]
Starting Flume [ OK ]
Starting Postgre SQL [ OK ]
Starting Ranger-admin [WARNINGS]
find: failed to restore initial working directory: Permission denied
Starting data node [ OK ]
Starting name node [ OK ]
Safe mode is OFF
Starting Oozie [ OK ]
Starting Ranger-usersync [ OK ]
Starting Zookeeper nodes [ OK ]
Starting NFS portmap [ OK ]
Starting Hdfs nfs [ OK ]
Starting Hive server [ OK ]
Starting Hiveserver2 [ OK ]
Starting Ambari server [ OK ]
Starting Ambari agent [ OK ]
Starting Node manager [ OK ]
Starting Yarn history server [ OK ]
Starting Webhcat server [ OK ]
Starting Spark [ OK ]
Starting Mapred history server [ OK ]
Starting Zeppelin [ OK ]
Starting Resource manager [ OK ]
Safe mode is OFF
Starting sandbox...
/etc/init.d/startup_script: line 97: /proc/sys/kernel/hung_task_timeout_secs: No such file or directory
Starting shellinaboxd: [ OK ] NOTE: You can ignore any warnings or errors that are displayed. Now the sandbox process are running and you can access the Ambari interface if http://localhost:8080 . Log in with the raj_ops username and password. You should notice all of the configurations you previously had are still there. In my case I turned Zeppelin maintenance mode on and maintenance mode off for some other services. They were as I expected them. Create new sandbox containers If you want to create new sandbox containers using the baseline image, make sure you have imported the default sandbox image. Then create those containers using sandbox instead of the atlas-demo label. Review If you successfully followed along with this tutorial, we were able backup our existing containers to images. We created a new baseline Docker vm image increasing our storage space from 64GB to 120GB. We deleted our existing Docker vm image and recreated it using the new baseline image. We imported our saved containers as new images. And finally we were able to create new containers using the saved container image which contained all of your previous settings and configurations.
... View more
Labels:
08-20-2018
09:05 PM
Is this workflow similar for HDF 3.1 Sandbox? If not, kindly let us know the changes.
... View more
11-10-2016
02:35 AM
2 Kudos
Objective If you are using Docker on CentOS 7, you may have run into issues importing the Docker version of the Hortonworks Sandbox for HDP 2.5. The default configuration of Docker on CentOS 7 limits your Docker virtual machine image to 10GB in size. The Docker HDP sandbox is 13GB in size. This will cause the loading process of the sandbox to fail. This tutorial will guide you through the process of installing Docker on CentOS 7 using Vagrant and modifying the configuration of Docker to move the location of and increase the size of Docker virtual machine image. While we are using Vagrant + Virtualbox, this process should work for any install of CentOS (Amazon, etc) with small changes because VirtualBox and related plugins are not needed. Prerequisites You should already have downloaded the Docker HDP 2.5 Sandbox. Read more here: Docker HDP Sandbox You should already have installed VirtualBox 5.1.x. Read more here: VirtualBox You should already have installed Vagrant 1.8.6. Read more here: Vagrant You should already have installed the vagrant-vbguest plugin. This plugin will keep the VirtualBox Guest Additions software current as you upgrade your kernel and/or VirtualBox versions. Read more here: vagrant-vbguest You should already have installed the vagrant-hostmanager plugin. This plugin will automatically manage the /etc/hosts file on your local mac and in your virtual machines. Read more here: vagrant-hostmanager Scope Mac OS X 10.11.6 (El Capitan) VirtualBox 5.1.6 Vagrant 1.8.6 vagrant-vbguest plugin 0.13.0 vagrant-hostnamanger plugin 1.8.5 Steps Create Vagrant project directory Before we get started, determine where you want to keep your Vagrant project files. Each Vagrant project should have its own directory. I keep my Vagrant projects in my ~/Development/Vagrant directory. You should also use a helpful name for each Vagrant project directory you create.
cd ~/Development/Vagrant
mkdir centos7-docker
cd centos7-docker We will be using a CentOS 7.2 Vagrant box, so I include centos7 in the Vagrant project name to differentiate it from a Centos 6 project. The project is for Docker, so I include that in the name. Thus we have a project directory name of centos7-docker. Create Vagrant project files Create Vagrantfile The Vagrantfile tells Vagrant how to configure your virtual machines. Here is my Vagrantfile:
# -*- mode: ruby -*-
# vi: set ft=ruby :
# Using yaml to load external configuration files
require 'yaml'
Vagrant.configure("2") do |config|
# Using the hostmanager vagrant plugin to update the host files
config.hostmanager.enabled = true
config.hostmanager.manage_host = true
config.hostmanager.manage_guest = true
config.hostmanager.ignore_private_ip = false
# Loading in the list of commands that should be run when the VM is provisioned.
commands = YAML.load_file('commands.yaml')
commands.each do |command|
config.vm.provision :shell, inline: command
end
# Loading in the VM configuration information
servers = YAML.load_file('servers.yaml')
servers.each do |servers|
config.vm.define servers["name"] do |srv|
srv.vm.box = servers["box"] # Speciy the name of the Vagrant box file to use
srv.vm.hostname = servers["name"] # Set the hostname of the VM
srv.vm.network "private_network", ip: servers["ip"], :adapater=>2 # Add a second adapater with a specified IP
srv.vm.network :forwarded_port, guest: 22, host: servers["port"] # Add a port forwarding rule
srv.vm.provision :shell, inline: "sed -i'' '/^127.0.0.1\\t#{srv.vm.hostname}\\t#{srv.vm.hostname}$/d' /etc/hosts"
srv.vm.provider :virtualbox do |vb|
vb.name = servers["name"] # Name of the VM in VirtualBox
vb.cpus = servers["cpus"] # How many CPUs to allocate to the VM
vb.memory = servers["ram"] # How much memory to allocate to the VM
vb.customize ["modifyvm", :id, "--cpuexecutioncap", "75"] # Limit to VM to 75% of available CPU
end
end
end
end Create a servers.yaml file The servers.yaml file contains the configuration information for our VMs. Here is the content from my file:
---
- name: docker
box: bento/centos-7.2
cpus: 6
ram: 12288
ip: 192.168.56.100
port: 10022 NOTE: My configuration uses 6 CPUs and 12GB of memory for VirtualBox. Make sure you have enough resources to use this configuration. For the purposes of the this demo, you can lower this to a 2CPU and 4GB of memory. Create commands.yaml file The commands.yaml file contains the list of commands that should be run on each VM when they are first provisioned. This allows us to automate configuration tasks that would other wise be tedious and/or repetitive. Here is the content from my file:
- "sudo yum -y install net-tools ntp wget"
- "sudo systemctl enable ntpd && sudo systemctl start ntpd"
- "sudo systemctl disable firewalld && sudo systemctl stop firewalld"
- "sudo sed -i --follow-symlinks 's/^SELINUX=.*/SELINUX=disabled/g' /etc/sysconfig/selinux" Start Virtual Machine Once you have created the 3 files in your Vagrant project directory, you are ready to start your virtual machine. Creating the vm for the first time and starting it every time after that uses the same command:
vagrant up NOTE: During the startup process, the vagrant-hostmanager will prompt you for a password. This is the sudo password for your local machine. It needs to use sudo to update the /etc/hosts file. Once the process is complete you should have 1 server running. You can verify by looking at the VirtualBox user interface; you should have a virtual machine called docker running. Connect to virtual machine You are able to login to the virtual machine via ssh using the vagrant ssh command.
vagrant ssh Install Docker in Vagrant virtual machine Now that you are connected to the virtual machine, it's time to install Docker. First we'll create the docker.repo file:
sudo tee /etc/yum.repos.d/docker.repo <<-'EOF'
[dockerrepo]
name=Docker Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/7/
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg
EOF Now we can install Docker:
sudo yum install docker-engine Now we need to enable the Docker service:
sudo systemctl enable docker.service Now we can start the Docker service:
sudo systemctl start docker Load Docker HDP sandbox We will attempt to load the HDP sandbox into Docker using the default Docker settings. This process will fail after the image has loaded approximately 10GB of data. First, make sure a copy of the Docker HDP sandbox file HDP_2.5_docker.tar.gz is located in your project directory. This will allow you to access the file via the /vagrant/ directory within your virtual machine. I'm doing this on a Mac and my file was downloaded in ~/Downloads . This command is run on your local computer:
cp ~/Downloads/HDP_2.5_docker.tar.gz ~/Development/Vagrant/centos7-docker/ Now we can try loading the sandbox image on the VM. You need to use sudo to interact with Docker. You should see something similar to the following:
sudo docker load < /vagrant/HDP_2.5_docker.tar.gz
b1b065555b8a: Loading layer [==================================================>] 202.2 MB/202.2 MB
0b547722f59f: Loading layer [=====================================> ] 10.36 GB/13.84 GB
ApplyLayer exit status 1 stdout: stderr: write /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64/jre/lib/rt.jar: no space left on device As you can see above, the load command failed and it did so a little more than 10GB into the process. That is because the Docker virtual machine image defaults to 10GB in size. We can check to see where our Docker Root is located:
sudo docker info | grep "Docker Root"
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Docker Root Dir: /var/lib/docker We can see from above the default is /var/lib/docker . Change Docker configuration CentOS 7 uses systemd to manage services. You can read more about it here systemd. The older methods of modifying Docker configuration using /etc/sysconfig based configuration files should be avoided. While you can get everything working using this approach, it's better to use the appropriate systemd methods. Some people have modified the /etc/systemd/system/multi-user.target.wants/docker.service file to make changes to the Docker configuration. This file can be overwritten during software updates, so it is best to not modify this file. Instead you should use /etc/systemd/system/docker.service.d/docker.conf . We need to create an /etc/systemd/system/docker.service.d directory. This is where our docker.conf configuration file will be located.
sudo mkdir -p /etc/systemd/system/docker.service.d Now we can edit our docker.conf file:
sudo vi /etc/systemd/system/docker.service.d/docker.conf You should add the following configuration to the file:
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --graph="/mnt/docker-data" --storage-driver=overlay --storage-opt=dm.basesize=30G The blank ExecStart= entry is not a typo. It is required to wipe the default configuration from memory. The --graph parameter specifies the location where the Docker image file should be located. Make sure this location has sufficient room for your expected data size. The --storage-driver parameter specifies the storage driver for Docker. The default is devicemapper which is limited to 10GB. Set it to overlay to allow for larger file sizes. You can read more about the storage drivers here docker storage drivers. The --storage-opt parameter allows us to change the base image size of the virtual machine. In this example I've set my base image size to 30GB. You may want to use a larger size. Restart Docker Now that we've updated our configuration, we need to restart the Docker daemon.
sudo systemctl daemon-reload
sudo systemctl restart docker We can now check to see where our Docker Root is. You should see something similar to this:
sudo docker info | grep "Docker Root"
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Docker Root Dir: /mnt/docker-data As you can see, our Docker Root has moved to /mnt/docker-data which is the location we specified with the --graph parameter in our docker.conf file. Load Docker HDP sandbox Now that we have updated our Docker configuration and restarted the daemon, we should be able to load our HDP sandbox again. Let's run the load command to see if it completed. You should see something similar to this:
sudo docker load < /vagrant/HDP_2.5_docker.tar.gz
b1b065555b8a: Loading layer [==================================================>] 202.2 MB/202.2 MB
0b547722f59f: Loading layer [==================================================>] 13.84 GB/13.84 GB
99d7327952e0: Loading layer [==================================================>] 234.8 MB/234.8 MB
294b1c0e07bd: Loading layer [==================================================>] 207.5 MB/207.5 MB
fd5c10f2f1a1: Loading layer [==================================================>] 387.6 kB/387.6 kB
6852ef70321d: Loading layer [==================================================>] 163 MB/163 MB
517f170bbf7f: Loading layer [==================================================>] 20.98 MB/20.98 MB
665edb80fc91: Loading layer [==================================================>] 337.4 kB/337.4 kB
Loaded image: sandbox:latest As you can see, the load command was successful. Review If you successfully followed along with this tutorial, we were able setup a CentOS 7 virtual machine using Vagrant. We updated the Docker configuration to use a different location for the virtual machine image and changed the base size of that image. Once these changes were complete, we were able to successfully load the HDP sandbox image. You can read more about how to install Docker on CentOS 7 here https://docs.docker.com/engine/installation/linux/centos/ and configure it here https://docs.docker.com/engine/admin/
... View more
Labels: