Community Articles

myoung · ‎03-04-2017

Objective

This tutorial will walk you through the process of using Ansible, an agent-less automation tool, to create instances on AWS. The Ansible playbook we will use is relatively simple; you can use it as a base to experiment with more advanced features. You can read more about Ansible here: Ansible.

Ansible is written in Python and is installed as a Python module on the control host. The only requirement for the hosts managed by Ansible is the ability to login with SSH. There is no requirement to install any software on the host managed by Ansible.

If you have never used Ansible, you can become more familiar with it by going through some basic tutorials. The following two tutorials are a good starting point:

This tutorial is part 1 of a 2 part series. Part 2 in the series will show you how to use Ansible to deploy Hortonworks Data Platform (HDP) on Amazon Web Services (AWS).

This tutorial was created as a companion to the Ansible + Hadoop talk I gave at the Ansible NOVA Meetup in February 2017. You can find the slides to that talk here: SlideShare

You can get a copy of the playbook from this tutorial here: Github

Prerequisites

You must have an existing AWS account.
You must have access to your AWS Access and Secret keys.
You are responsible for all AWS costs incurred.

Scope

This tutorial was tested using the following environment and components:

Mac OS X 10.11.6 and 10.12.3
Amazon Web Services
Anaconda 4.1.6 (Python 2.7.12)
Ansible 2.0.0.2 and 2.1.3.0

Steps

Create a project directory

You need to create a directory for your Ansible playbook. I prefer to create my project directories in ~/Development.

mkdir ~/Development/ansible-aws
cd ~/Development/ansible-aws

Install Ansible module

If you use the Anaconda version of Python, you already have access to Ansible. If you are not using Anaconda, then you can usually install Ansible using the following command:

pip install ansible

To read more about how to install Ansible: Ansible Installation

Overview of our Ansible playbook

Our playbook is relatively simple. It consists of a single inventory file, single group_vars file and a single playbook file. Here is the layout of the file and directory structure:

+- ansible-aws/
   |
   +- group_vars/
   |  +- all
   |
   +- inventory/
   |  +- hosts
   |
   +- playbooks/
   |  +- ansible-aws.yml

group_vars/all

You can use variables in your playbooks using the {{variable name}} syntax. These variables are populated based on values stored in your variable files. You can explicitly load variable files in your playbooks.

However, all playbooks will automatically load the variables in the group_vars/all variable file. The all variable file is loaded for all hosts regardless of the groups the host may be in. In our playbook, we are placing our AWS configuration values in the all file.

Edit the group_vars/all file. Copy and paste the following text into the file:

aws_access_key: <enter AWS access key>
aws_secret_key: <enter AWS secret key>
key_name: <enter private key file alias name>
aws_region: <enter AWS region>
vpc_id: <enter VPC ID>
ami_id: ami-6d1c2007
instance_type: m4.2xlarge
my_local_cidr_ip: <enter cidr_ip>

aws_access_key: You need to enter your AWS Access key
aws_secret_key: You need to enter your AWS Secret key
key_name: The alias name you gave to the AWS private key which you will use to SSH into the instances. In my case I created a key called ansible.
aws_region: The AWS region where you want to deploy your instances. In my case I am using us-east-1.
vpc_id: The specific VPC in which you want to place your instances.
ami_id: The specific AMI you want to deploy for your instances. The ami-6d1c2007 AMI is a CentOS 7 image.
instance_type: The type of AWS instance. For deploying Hadoop, I recommend at least m4.2xlarge. A faster alternative is c4.4xlarge.
my_local_cidr_ip: Your local computer's CIDR IP address. This is used for creating the security rules that allow your local computer to access the instances. An example CIDR format is 192.168.1.1/32. Make sure this set to your computer's public IP address.

After you have entered your appropriate settings, save the file.

inventory/hosts

Ansible requires a list of known hosts against which playbooks and tasks are run. We will tell Ansible to use a specific host file with the -i inventory/hosts parameter.

Edit the inventory/hosts file. Copy and paste the following text into the file:

[local]
localhost ansible_python_interpreter=/Users/myoung/anaconda/bin/python

[local]: Defines the group the host belongs to. You have the option for a playbook to run against all hosts, a specific group of hosts, or an individual host. This AWS playbook only runs on your local computer. That is because it uses the AWS APIs to communicate with AWS.
localhost: This is the hostname. You can list multiple hosts, 1 per line under each group heading. A host can belong to multiple groups.
ansible_python_interpreter: Optional entry that tells Ansible which specific version of Python to run. Because I am using Anaconda Python, I've included that setting here.

After you have entered your appropriate settings, save the file.

playbooks/ansible-aws.yml

The playbook is where we define the list of tasks we want to perform. Our playbook will consist of 2 tasks. The first task is to create a specific AWS Security Group. The second tasks is to create a specific configuration of 6 instances on AWS.

Edit the file playbooks/ansible-aws.yml. Copy and paste the following text into the file:

---
# Basic provisioning example
- name: Create AWS resources
  hosts: localhost
  connection: local
  gather_facts: False
  tasks:
  - name: Create a security group
    ec2_group:
      name: ansible
      description: "Ansible Security Group"
      region: "{{aws_region}}"
      vpc_id: "{{vpc_id}}""
      aws_access_key: "{{aws_access_key}}"
      aws_secret_key: "{{aws_secret_key}}"
      rules:
        - proto: all
          cidr_ip: "{{my_local_cidr_ip}}"
        - proto: all
          group_name: ansible
      rules_egress:
        - proto: all
          cidr_ip: 0.0.0.0/0
    register: firewall
  - name: Create an EC2 instance
    ec2:
      aws_access_key: "{{aws_access_key}}"
      aws_secret_key: "{{aws_secret_key}}"
      key_name: "{{key_name}}"
      region: "{{aws_region}}"
      group_id: "{{firewall.group_id}}"
      instance_type: "{{instance_type}}"
      image: "{{ami_id}}"
      wait: yes
      volumes:
        - device_name: /dev/sda1
          volume_type: gp2
          volume_size: 100
          delete_on_termination: true
      exact_count: 6
      count_tag:
         Name: aws-demo
      instance_tags:
         Name: aws-demo
    register: ec2

This playbook uses the Ansible ec2 and ec2_group modules. You can read more about the options available to those modules here:

The task to create the EC2 security group creates a group named ansible. It defines 2 ingress rules and 1 egress rule for that security group. The first ingress rule is to allow all inbound traffic from any host in the security group ansible. The second ingress rule is to allow all inbound traffic from your local computer IP address. The egress rule allows all traffic out from all of the hosts.

The task to create the EC2 instances creates 6 hosts because of the exact_count setting. It creates a tag called hadoop-demo on each of the instances and uses that tag to determine how many hosts exists. You can chose to use smaller number of hosts.

You can specify volumes to mount on each of the instances. The default volume size is 8 GB and is too small for deploying Hadoop later. I recommend setting the size to at least 100 GB as above. I also recommend you set delete_on_termination to true. This will tell AWS to delete the storage after you have deleted the instances. If you do not do this, then storage will be kept and you will be charged for it.

After you have entered your appropriate settings, save the file.

Running the Ansible playbook

Now that our 3 files have been created and saved with the appropriate settings, we can run the playbook. To run the playbook, you use the ansible-playbook -i inventory/hosts playbooks/ansible-aws.yml command. You should see something similar to the following:

$ ansible-playbook -i inventory/hosts playbooks/ansible-aws.yml
PLAY [Create AWS resources] ****************************************************
TASK [Create a security group] *************************************************
changed: [localhost]
TASK [Create an EC2 instance] **************************************************
changed: [localhost]
PLAY RECAP *********************************************************************
localhost                  : ok=2    changed=2    unreachable=0    failed=0

The changed lines indicate that Ansible found a configuration that needed to be modify to be consistent with our requested state. For the security group task, you would see this if your security group didn't exist or if you had a different set of ingress or egress rules. For the instance tasks, you would see this if there were less than or more than 6 hosts tagged as aws-demo.

Check AWS console.

If you check your AWS console, you should be able to confirm the instances are created. You should see something similar to the following:

Review

If you successfully followed along with this tutorial, you have created a simple Ansible playbook with 2 tasks using the ec2 and ec2_group Ansible modules. The playbook creates an AWS security group and instances which can be used later for deploying HDP on AWS.

Cloudera Community

Community Articles

Using Ansible to deploy instances on AWS

Apache Hadoop

Hortonworks Data Platform (HDP)

Security

Objective

Prerequisites

Scope

Steps

Create a project directory

Install Ansible module

Overview of our Ansible playbook

group_vars/all

inventory/hosts

playbooks/ansible-aws.yml

Running the Ansible playbook

Check AWS console.

Review

Using Ansible to deploy HDP on AWS

Practice on using ansible 2.4 to deploy HDP 2.6.4....

Deploy HDP Sandbox on AWS

Deploying IBM DSX on AWS

Using Cloudbreak to deploy HDP 2.6 and Spark 2.1 o...

Using Hadoop Credential API to store AWS secrets

How to deploy R Models in CML

Cloudbreak : Use existing vpc/subnet for AWS Quick...

Zeppelin Multiple Instances

SQOOP - MySQL - AWS Practice Instance