About myoung

agadrias · ‎08-28-2018

When I try to launch Zeppellin UI, i get this error: error.jpg

myoung · ‎05-23-2017

This tutorial will walk you through the process of using Cloudbreak to deploy an HDP 2.6 cluster with Spark 2.1. We'll copy and edit the existing hdp-spark-cluster blueprint which deploys Spark 1.6 to create a new blueprint which installs Spark 2.1. This tutorial is part one of a two-part series. The second tutorial walks you through using Zeppelin to verify the Spark 2.1 installation. You can find that tutorial here: HCC Article Prerequisites You should already have a Cloudbreak v1.14.0 environment running. You can follow this article to create a Cloudbreak instance using Vagrant and Virtualbox: HCC Article You should already have updated Cloudbreak to support deploying HDP 2.6 clusters. You can follow this article to enable that functionality: HCC Article Scope This tutorial was tested in the following environment: Cloudbreak 1.14.4 AWS EC2 HDP 2.6 Spark 2.1 Steps Create Blueprint Before we can deploy a Spark 2.1 cluster using Cloudbreak, we need to create a blueprint that specifies Spark 2.1. Cloudbreak ships with 3 blueprints out of the box: hdp-small-default: basic HDP cluster with Hive and HBase hdp-spark-cluster: basic HDP cluster with Spark 1.6 hdp-streaming-cluster: basic HDP cluster with Kafka and Storm We will use the hdp-spark-cluster as our base blueprint and edit it to deploy Spark 2.1 instead of Spark 1.6. Click on the manage blueprints section of the UI. Click on the hdp-spark-cluster blueprint. You should see something similar to this: Click on the blue copy & edit button. You should see something similar to this: For the Name , enter hdp26-spark21-cluster . This tells us the blueprint is for an HDP 2.6 cluster using Spark 2.1. Enter the same information for the Description . You should see something similar to this: Now, we need to edit the JSON portion of the blueprint. We need to change the Spark 1.6 components to Spark 2.1 components. We don't need change where they are deployed. The following entries within the JSON are for Spark 1.6: "name": "SPARK_CLIENT" "name": "SPARK_JOBHISTORYSERVER" "name": "SPARK_CLIENT" We will replace SPARK with SPARK2 . These entries should look as follows: "name": "SPARK2_CLIENT" "name": "SPARK2_JOBHISTORYSERVER" "name": "SPARK2_CLIENT" NOTE: There are two entries for SPARK_CLIENT. Make sure you change both. We are going to add an entry for the LIVY component. We will add it to the same node as the SPARK_JOBHISTORYSERVER . We are also going to add an entry for the SPARK2_THRIFTSERVER component. We will add it to the same node as the SPARK_JOBHISTORYSERVER . Let's add those two entries just below SPARK2_CLIENT in the host_group_master_2 section. Change the following: { "name": "SPARK2_JOBHISTORYSERVER" }, { "name": "SPARK2_CLIENT" }, to this: { "name": "SPARK2_JOBHISTORYSERVER" }, { "name": "SPARK2_CLIENT" }, { "name": "SPARK2_THRIFTSERVER" }, { "name": "LIVY2_SERVER" }, We also need to update the blueprint_name to hdp26-spark21-cluster and the stack_version to 2.6 . you should have something similar to this: "Blueprints": { "blueprint_name": "hdp26-spark21-cluster", "stack_name": "HDP", "stack_version": "2.6" } If you prefer, you can copy and paste the following blueprint JSON: { "host_groups": [ { "name": "host_group_client_1", "configurations": [], "components": [ { "name": "ZOOKEEPER_CLIENT" }, { "name": "PIG" }, { "name": "OOZIE_CLIENT" }, { "name": "HBASE_CLIENT" }, { "name": "HCAT" }, { "name": "KNOX_GATEWAY" }, { "name": "METRICS_MONITOR" }, { "name": "FALCON_CLIENT" }, { "name": "TEZ_CLIENT" }, { "name": "SPARK2_CLIENT" }, { "name": "SLIDER" }, { "name": "SQOOP" }, { "name": "HDFS_CLIENT" }, { "name": "HIVE_CLIENT" }, { "name": "YARN_CLIENT" }, { "name": "METRICS_COLLECTOR" }, { "name": "MAPREDUCE2_CLIENT" } ], "cardinality": "1" }, { "name": "host_group_master_3", "configurations": [], "components": [ { "name": "ZOOKEEPER_SERVER" }, { "name": "APP_TIMELINE_SERVER" }, { "name": "TEZ_CLIENT" }, { "name": "HBASE_MASTER" }, { "name": "HBASE_CLIENT" }, { "name": "HDFS_CLIENT" }, { "name": "METRICS_MONITOR" }, { "name": "SECONDARY_NAMENODE" } ], "cardinality": "1" }, { "name": "host_group_slave_1", "configurations": [], "components": [ { "name": "HBASE_REGIONSERVER" }, { "name": "NODEMANAGER" }, { "name": "METRICS_MONITOR" }, { "name": "DATANODE" } ], "cardinality": "6" }, { "name": "host_group_master_2", "configurations": [], "components": [ { "name": "ZOOKEEPER_SERVER" }, { "name": "ZOOKEEPER_CLIENT" }, { "name": "PIG" }, { "name": "MYSQL_SERVER" }, { "name": "HIVE_SERVER" }, { "name": "METRICS_MONITOR" }, { "name": "SPARK2_JOBHISTORYSERVER" }, { "name": "SPARK2_CLIENT" }, { "name": "SPARK2_THRIFTSERVER" }, { "name": "LIVY2_SERVER" }, { "name": "TEZ_CLIENT" }, { "name": "HBASE_CLIENT" }, { "name": "HIVE_METASTORE" }, { "name": "ZEPPELIN_MASTER" }, { "name": "HDFS_CLIENT" }, { "name": "YARN_CLIENT" }, { "name": "MAPREDUCE2_CLIENT" }, { "name": "RESOURCEMANAGER" }, { "name": "WEBHCAT_SERVER" } ], "cardinality": "1" }, { "name": "host_group_master_1", "configurations": [], "components": [ { "name": "ZOOKEEPER_SERVER" }, { "name": "HISTORYSERVER" }, { "name": "OOZIE_CLIENT" }, { "name": "NAMENODE" }, { "name": "OOZIE_SERVER" }, { "name": "HDFS_CLIENT" }, { "name": "YARN_CLIENT" }, { "name": "FALCON_SERVER" }, { "name": "METRICS_MONITOR" }, { "name": "MAPREDUCE2_CLIENT" } ], "cardinality": "1" } ], "Blueprints": { "blueprint_name": "hdp26-spark21-cluster", "stack_name": "HDP", "stack_version": "2.6" } } Once you have all of the changes in place, click the green create blueprint button. Create Security Group We need to create a new security group to use with our cluster. By default, the existing security groups only allow ports 22, 443, and 9443. As part of this tutorial, we will use Zeppelin to test Spark 2.1. We'll create a new security group that opens all ports to our IP address. Click on the manage security groups section of the UI. You should see something similar to this: Click on the green create security group button. You should see something similar to this: First you need to select the appropriate cloud platform. I'm using AWS, so that is what I selected. We need to provide a unique name for our security group. I used all-ports-my-ip . You should use something descriptive. Provide a helpful description as well. Now we need to enter our personal IP address CIDR. I am using #.#.#.#/32 ; your IP address will obviously be different. You need to enter the port range. There is a known issue in Cloudbreak that prevents you from using 0-65356 , so we'll use 1-65356 . For the protocol, use tcp . Once you have everything entered, you should see something similar to this: Click the green Add Rule button to add this rule to our security group. You can add multiple rules, but we have everything covered with our single rule. You should see something similar to this: If everything looks good, click the green create security group button. This will create our new security group. You should see something like this: Create Cluster Now that our blueprint has been created and we have an new security group, we can begin building the cluster. Ensure you have selected the appropriate credential for your cloud environment. Then click the green create cluster button. You should see something similar to this: Give your cluster a descriptive name. I used spark21test , but you can use whatever you like. Select an appropriate cloud region. I'm using AWS and selected US East (N. Virginia) , but you can use whatever you like. You should see something similar to this: Click on the Setup Network and Security button. You should see something similar to this: We are going to keep the default options here. Click on the Choose Blueprint button. You should see something similar to this: Expand the blueprint dropdown menu. You should see the blueprint we created before, hdp26-spark21-cluster . Select the blueprint. You should see something similar to this: You should notice the new security group is already selected. Cloudbreak did not automatically figure this out. The instance templates and security groups are selected alphabetically be default. Now we need to select a node on which to deploy Ambari. I typically deploy Ambari on the master1 server. Check the Ambari check box on one of the master servers. If everything looks good, click on the green create cluster , You should see something similar to this: Once the cluster has finished building, you can click on the arrow for the cluster we created to get expanded details. You should see something similar to this: Verify Versions Once the cluster is fully deployed, we can verify the versions of the components. Click on the Ambari link on the cluster details page. Once you login to Ambari, you should see something similar to this: You should notice that Spark2 is shown in the component list. Click on Spark2 in the list. You should see something similar to this: You should notice that both the Spark2 Thrift Server and the Livy2 Server have been installed. Now lets check the overall cluster verions. Click on the Admin link in the Ambari menu and select Stacks and Versions . Then click on the Versions tab. You should see something similar to this: As you can see, HDP 2.6.0.3 was deployed. Review If you have successfully followed along with this tutorial, you should know how to create a new security group and blueprint. The blueprint allows you to deploy HDP 2.6 with Spark 2.1. The security group allows you to access all ports on the cluster from your IP address. Follow along in part 2 of the tutorial series to use Zeppelin to test Spark 2.1.

myoung · ‎05-18-2017

Prerequisites You should already have a Cloudbreak v1.14.4 environment running. You can follow this article to create a Cloudbreak instance using Vagrant and Virtualbox: HCC Article You should already have credentials created in Cloudbreak for deploying on AWS (or Azure). Scope This tutorial was tested in the following environment: macOS Sierra (version 10.12.4) Cloudbreak 1.14.4 AWS EC2 NOTE: Cloudbreak 1.14.0 (TP) had a bug which caused HDP 2.6 clusters installs to fail. You should upgrade your Cloudbreak deployer instance to 1.14.4. Steps Create application.yml file UPDATE 05/24/2017: The creation of a custom application.yml file is not required with Cloudbreak 1.14.4. This version of Cloudbreak includes support for HDP 2.5 and HDP 2.6. This step remains for educational purposes for future HDP updates. You need to create an application.yml file in the etc directory within your Cloudbreak deployment directory. This file will contain the repo information for HDP 2.6. If you followed my tutorial linked above, then your Cloudbreak deployment directory should be /opt/cloudbreak-deployment . If you are using a Cloudbreak instance on AWS or Azure, then your Cloudbreak deployment directory is likely /var/lib/cloudbreak-deployment/ . Edit your <cloudbreak-deployment>/etc/application.yml file using your favorite editor. Copy and paste the following in the file: cb: ambari: repo: version: 2.5.0.3-7 baseurl: http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.5.0.3 gpgkey: http://public-repo-1.hortonworks.com/ambari/centos6/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins database: vendor: embedded host: localhost port: 5432 name: postgres username: ambari password: bigdata </p> <p> hdp: entries: 2.5: version: 2.5.0.1-210 repoid: HDP-2.5 repo: stack: repoid: HDP-2.5 redhat6: http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.5.5.0 redhat7: http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.5.5.0 util: repoid: HDP-UTILS-1.1.0.21 redhat6: http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos6 redhat7: http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos7 2.6: version: 2.6.0.0-598 repoid: HDP-2.6 repo: stack: repoid: HDP-2.6 redhat6: http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.6.0.3 redhat7: http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.6.0.3 util: repoid: HDP-UTILS-1.1.0.21 redhat6: http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos6 redhat7: http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos7 Start Cloudbreak Once you have created your application.yml file, you can start Cloudbreak. $ cbd start NOTE: It may take a couple of minutes before Cloudbreak is fully running. Create HDP 2.6 Blueprint To create an HDP 2.6 cluster, we need to update our blueprint to specify HDP 2.6. On the main Cloudbreak UI, click on manage blueprints . You should see something similar to this: You should see 3 default blueprints. We are going to use the hdp-small-default blueprint as our base. Click on the hdp-small-default blueprint name. You should see something similar to this: Now click on the blue copy & edit button. You should see something similar to this: For the Name , you should enter something unqiue and descriptive. I suggest hdp26-small-default . For the Description , you can enter the same information. You should see something similar to this: Now we need to edit the JSON portion of the blueprint. Scroll down to the bottom of the JSON. You should see something similar to this: Now edit the blueprint_name value to be hdp26-small-default and edit the stack_version to be 2.6 . You should see something similar to this: Now click on the green create blueprint button. You should see the new blueprint visible in the list of blueprints. Create HDP 2.6 Small Default Cluster Now that our blueprint has been created, we can create a cluster and select this blueprint to install HDP 2.6. Select the appropriate credential for your Cloud environment. Click on the create cluster button. You should see something similar to this: Provide a unique, but descriptive Cluster Name . Ensure you select an appropriate Region . I chose hdp26test as my cluster name and I'm using the US East region: Now advanced to the next step by clicking on Setup Network and Security . You should see something similar to this: We don't need to make any changes here, so click on the Choose Blueprint button. You should see something similar to this: In the Blueprint dropdown, you should see the blueprint we created. Select the hdp26-small-default blueprint. You should see something similar to this: You need to select which node Ambari will run on. I typically select the master1 node. You should see something similar to this: Now you can click on the Review and Launch button. You should see something similar to this: Verify the information presented. If everything looks good, click on the create and start cluster button . Once the cluster build process has started, you should see something similar to this: Verify HDP Version Once the cluster has finished building, you can click on the cluster in the Cloudbreak UI. You should see something similar to this: Click on the Ambari link to load Ambari. Login using the default username and password of admin . Now click on the Admin link in the menu. You should see something similar to this: Click on the Stack and Versions link. You should see something similar to this: You should notice that HDP 2.6.0.3 has been deployed. Review If you have successfully followed along with this tutorial, you should know how to create/update /etc/application.yml to add specific Ambair and HDP repositories. You should have successfully created an updated blueprint and deployed HDP 2.6 on your cloud of choice.

myoung · ‎05-13-2017

Objectives This tutorial will walk you through the process of using Cloudbreak recipes to install Anaconda on an your HDP cluster during cluster provisioning. This process can be used to automate many tasks on the cluster both pre-install and post-install. Prerequisites You should already have a Cloudbreak v1.14.0 environment running. You can follow this article to create a Cloudbreak instance using Vagrant and Virtualbox: HCC Article You should already have credentials created in Cloudbreak for deploying on AWS (or Azure). Scope This tutorial was tested in the following environment: macOS Sierra (version 10.12.4) Cloudbreak 1.14.0 TP AWS EC2 Anaconda 2.7.13 Steps Create Recipe Before you can use a recipe during a cluster deployment, you have to create the recipe. In the Cloudbreak UI, look for the "mange recipes" section. It should look similar to this: If this is your first time creating a recipe, you will have 0 recipes instead of the 2 recipes show in my interface. Now click on the arrow next to manage recipes to display available recipes. You should see something similar to this: Now click on the green create recipe button. You should see something similar to this: Now we can enter the information for our recipe. I'm calling this recipe anaconda . I'm giving it the description of Install Anaconda . You can choose to install Anaconda as either pre-install or post-install. I'm choosing to do the install post-install. This means the script will be run after the Ambari installation process has started. So choose the Execution Type of POST . Choose Script so we can copy and paste the shell script. You can also specify a file to upload or a URL (gist for example). Our script is very basic. We are going to download the Anaconda install script, then run it in silent mode. Here is the script: #!/bin/bash wget https://repo.continuum.io/archive/Anaconda2-4.3.1-Linux-x86_64.sh bash ./Anaconda2-4.3.1-Linux-x86_64.sh -b -p /opt/anaconda When you have finished entering all of the information, you should see something similar to this: If everything looks good, click on the green create recipe button. After the recipe has been created, you should see something similar to this: Create a Cluster using a Recipe Now that our recipe has been created, we can create a cluster that uses the recipe. Go through the process of creating a cluster up to the Choose Blueprint step. This step is when you select the recipe you want to use. The recipes are not selected by default; you have to select the recipes you wish to use. You specify recipes for 1 or more host groups. This allows you to run different recipes across different host groups (masters, slaves, etc). You can also select multiple recipes. We want to use the ```hdp-small-default``` blueprint. This will create a basic HDP cluster. If you select the anaconda recipe, you should see something similar to this: [Select Recipe]( ) In our case, we are going to run the recipe on every host group. If you intend to use something like Anaconda across the cluster, you should install it on at least the slave nodes and the client nodes. After you have selected the recipe for the host groups, click the Review & Launch button, then launch the cluster. As the cluster is building, you should see a message in the Cloudbreak UI that indicates the recipe is running. When that happens, you will see something similar to this: Cloudbreak will create logs for each recipe that runs on each host. These logs are located at /var/log/recipe and have the name of the recipe and whether it is pre or post install. For example, our recipe log is called post-anaconda.log . You can tail this log file to following the execution of the script. NOTE: Post install scripts won't be executed until the Ambari server is installed and the cluster is building. You can always monitor the /var/log/recipe directory on a node to see when the script is being executed. The time it takes to run the script will vary depending on the cloud environment and how long it takes to spin up the cluster. On your cluster, you should be able to see the post-install log: $ ls /var/log/recipes post-anaconda.log post-hdfs-home.log Once the install process is complete, you should be able to verify that Anaconda is installed. You need to ssh into one of the cloud instances. You can get the public ip address from the Cloudbreak UI. You will login using the corresponding private key to the public key you entered when you created the Cloudbreak credential. You should login as the cloudbreak user. You should see something similar to this: $ ssh -i ~/Downloads/keys/cloudbreak_id_rsa cloudbreak@#.#.#.# The authenticity of host '#.#.#.# (#.#.#.#)' can't be established. ECDSA key fingerprint is SHA256:By1MJ2sYGB/ymA8jKBIfam1eRkDS5+DX1THA+gs8sdU. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '#.#.#.#' (ECDSA) to the list of known hosts. Last login: Sat May 13 00:47:41 2017 from 192.175.27.2 __| __|_ ) _| ( / Amazon Linux AMI ___|\___|___| https://aws.amazon.com/amazon-linux-ami/2016.09-release-notes/ 25 package(s) needed for security, out of 61 available Run "sudo yum update" to apply all updates. Amazon Linux version 2017.03 is available. Once you are on the server, you can check the version of python: $ /opt/anaconda/bin/python --version Python 2.7.13 :: Anaconda 4.3.1 (64-bit) Review If you have successfully followed along with this tutorial, you should know how to create pre and post install scripts. You should have successfully deployed a cluster on either AWS or Azure with Anaconda installed under /opt/anaconda on the nodes you specified.

andreas_maier · ‎07-14-2017

Answering my own question: It doesn't work with the latest version of cloudbreak (1.16.1). After login to the GUI I do get the error "Cannot retrieve csrf token" . But it does work with version 1.14.4 .

stefan_schuster · ‎05-09-2017

Thanks for your explanation @Michael Young ! Helped a lot.

kelvintong718 · ‎04-13-2017

@Michael Young Thanks ! That worked like a charm. I still have no idea why it doesn't let me upload using the HDFS UI so if you know why then I would love to know.

praskutti · ‎03-08-2017

@glupu this is exactly what i did, re-imported a new sandbox (and deleted the previous one). The one thing i lost due to that is missed all my Zeppelin notebooks (should have take a bakcup of them).

myoung · ‎03-18-2017

@Yogesh Sharma The _all field is analyzed by default, so you shouldn't have problems performing case-insensitive queries. You are also specifying the analyze_wildcard: true parameter which will attempt to analyze the query string with wildcards before running the query. As you have shown, the query itself returns hits. So the problem is with the aggregations. For your aggregations you are using the include parameter. Can you try using ".*drama.*" as the include value instead of "*drama*"?

myoung · ‎03-05-2017

Objective This tutorial will walk you through the process of using Ansible to deploy Hortonworks Data Platform (HDP) on Amazon Web Services (AWS). We will use the ansible-hadoop Ansible playbook from ObjectRocket to do this. You can find more information on that playbook here: ObjectRocket Ansible-Hadoop This tutorial is part 2 of a 2 part series. Part 1 in the series will show you how to use Ansible to create instances on Amazon Web Services (AWS). Part 1 is avaiablle here: HCC Article Part 1 This tutorial was created as a companion to the Ansible + Hadoop talk I gave at the Ansible NOVA Meetup in February 2017. You can find the slides to that talk here: SlideShare Prerequisites You must have an existing AWS account. You must have access to your AWS Access and Secret keys. You are responsible for all AWS costs incurred. You should have 3-6 instances created in AWS. If you completed Part 1 of this series, then you have an easy way to do that. Scope This tutorial was tested using the following environment and components: Mac OS X 10.11.6 and 10.12.3 Amazon Web Services Anaconda 4.1.6 (Python 2.7.12) Ansible 2.1.3.0 git 2.10.1 Steps Create python virtual environment We are going to create a Python virtual environment for installing the required Python modules. This will help eliminate module version conflicts between applications. I prefer to use Continuum Anaconda for my Python distribution. Therefore the steps for setting up a python virtual environment will be based on that. However, you can use standard python and the virtualenv command to do something similar. To create a virtual environment using Anaconda Python, you use the conda create command. We will name our virtual environment ansible-hadoop . The the following command: conda create --name ansible-hadoop python will create our virtual environment with the name specified. You should see something similar to the following: $ conda create --name ansible-hadoop python Fetching package metadata ....... Solving package specifications: .......... Package plan for installation in environment /Users/myoung/anaconda/envs/ansible-hadoop: The following NEW packages will be INSTALLED: openssl: 1.0.2k-1 pip: 9.0.1-py27_1 python: 2.7.13-0 readline: 6.2-2 setuptools: 27.2.0-py27_0 sqlite: 3.13.0-0 tk: 8.5.18-0 wheel: 0.29.0-py27_0 zlib: 1.2.8-3 Proceed ([y]/n)? y Linking packages ... cp: /Users/myoung/anaconda/envs/ansible-hadoop:/lib/libcrypto.1.0.0.dylib: No such file or directory mv: /Users/myoung/anaconda/envs/ansible-hadoop/lib/libcrypto.1.0.0.dylib-tmp: No such file or directory [ COMPLETE ]|################################################################################################| 100% # # To activate this environment, use: # $ source activate ansible-hadoop # # To deactivate this environment, use: # $ source deactivate # Switch python environments Before installing python packages for a specific development environment, you should activate the environment. This is done with the command source activate <environment> . In our case the environment is the one we just created, ansible-hadoop . You should see something similar to the following: $ source activate ansible-hadoop As you can see there is no output to indicate if we were successful in changing our environment. To verify, you can use the conda info --envs command list the available environments. The active environment will have a * . You should see something similar to the following: $ conda info --envs # conda environments: # ansible-hadoop * /Users/myoung/anaconda/envs/ansible-hadoop root /Users/myoung/anaconda As you can see, the ansible-hadoop environment has the * which means it is the active environment. If you want to remove your python virtual environment, you can use the following command: conda remove --name <environment> --all . If you want to remove the environment we just created you should see something similar to the following: $ conda remove --name ansible-hadoop --all Package plan for package removal in environment /Users/myoung/anaconda/envs/ansible-hadoop: The following packages will be REMOVED: openssl: 1.0.2k-1 pip: 9.0.1-py27_1 python: 2.7.13-0 readline: 6.2-2 setuptools: 27.2.0-py27_0 sqlite: 3.13.0-0 tk: 8.5.18-0 wheel: 0.29.0-py27_0 zlib: 1.2.8-3 Proceed ([y]/n)? y Unlinking packages ... [ COMPLETE ]|################################################################################################| 100% HW11380:test myoung$ conda info --envs # conda environments: # root * /Users/myoung/anaconda Install Python modules in virtual environment The ansible-hadoop playbook requires a specific version of Ansible. You need to install Ansible 2.1.3.0 before using the playbook. You can do that easily with the following command: pip install ansible==2.1.3.0 Using a Python Virtual environment allows us to easily use Ansbile 2.1.3.0 for our playbook without impacting the default Python versions. Clone ansible-hadoop github repo You need to clone the ansible-hadoop github repo to a working directory on your computer. I typically do this in ~/Development. $ cd ~/Development $ git clone https://github.com/objectrocket/ansible-hadoop.git You should see something similar to the following: $ git clone https://github.com/objectrocket/ansible-hadoop.git Cloning into 'ansible-hadoop'... remote: Counting objects: 3879, done. remote: Compressing objects: 100% (6/6), done. remote: Total 3879 (delta 1), reused 0 (delta 0), pack-reused 3873 Receiving objects: 100% (3879/3879), 6.90 MiB | 0 bytes/s, done. Resolving deltas: 100% (2416/2416), done. Configure ansible-hadoop You should make the ansible-hadoop repo directory your current working directory. There are a few configuration items we need to change. $ cd ansible-hadoop You should already have 3-6 instances available in AWS. You will need the public IP address of those instances. Configure ansible-hadoop/inventory/static We need to modify the inventory/static file to include the public IP addresses of our AWS instances. We need to assign master and slave nodes in the file. The instances are all the same configuration by default, so it doesn't matter which IP addresses you put for master and slave. The default version of the inventory/static file should look similar to the following: [master-nodes] master01 ansible_host=192.168.0.2 bond_ip=172.16.0.2 ansible_user=rack ansible_ssh_pass=changeme #master02 ansible_host=192.168.0.2 bond_ip=172.16.0.2 ansible_user=root ansible_ssh_pass=changeme [slave-nodes] slave01 ansible_host=192.168.0.3 bond_ip=172.16.0.3 ansible_user=rack ansible_ssh_pass=changeme slave02 ansible_host=192.168.0.4 bond_ip=172.16.0.4 ansible_user=rack ansible_ssh_pass=changeme [edge-nodes] #edge01 ansible_host=192.168.0.5 bond_ip=172.16.0.5 ansible_user=rack ansible_ssh_pass=changeme I'm going to be using 6 instances in AWS. I will put 3 instances as master servers and 3 instances as slave servers. There are a couple of extra options in the default file we don't need. The only values we need are: hostname : which should be master, slave or edge with a 1-up number like master01 and slave01 ansible_host : should be the AWS public IP address of the instances ansible_user : should be the username you SSH into the instance using the private key. You can easily get the public IP address of your instances from the AWS console. Here is what mine looks like: If you followed the part 1 tutorial, then the username for your instances should be centos . Edit your inventory/static . You should have something similar to the following: [master-nodes] master01 ansible_host=#.#.#.# ansible_user=centos master03 ansible_host=#.#.#.# ansible_user=centos master03 ansible_host=5#.#.#.# ansible_user=centos [slave-nodes] slave01 ansible_host=#.#.#.# ansible_user=centos slave02 ansible_host=#.#.#.# ansible_user=centos slave03 ansible_host=#.#.#.# ansible_user=centos #[edge-nodes] Your public IP addresses will be different. Also note the #[edge-nodes] value in the file. Because we are not using any edge nodes, we should comment that host group line in the file. Once you have all of your edits in place, save the file. Configure ansible-hadoop/ansible.cfg There are a couple of changes we need to make to the ansible.cfg file. This file provides overall configuration settings for Ansible. The default file in the playbook should look similar to the following: [defaults] host_key_checking = False timeout = 60 ansible_keep_remote_files = True library = playbooks/library/cloudera We need to change the library line to be library = playbooks/library/site_facts . We will be deploying HDP which requires the site_facts module. We also need to tell Ansible where to find the private key file for connecting to the instances. Edit the ansible.cfg file. You should modify the file to be similar to the following: [defaults] host_key_checking = False timeout = 60 ansible_keep_remote_files = True library = playbooks/library/site_facts private_key_file=/Users/myoung/Development/ansible-hadoop/ansible.pem Note the path of your private_key_file will be different. Once you have all of your edits in place, save the file. Configure ansible-hadoop/group_vars/hortonworks This step is optional. The group_vars/hortonworks file allows you to change how HDP is deployed. You can modify the version of HDP and Ambari. You can modify which components are installed. You can also specify custom repos and Ambari blueprints. I will be using the default file, so there are no changes made. Run bootstrap_static.sh Before installing HDP, we need to ensure our OS configuration on the AWS instances meet the installation prerequisites. This includes things like ensuring DNS and NTP are working and all of the OS packages are updated. These are tasks that you often find people doing manually. This would obviously be tedious across 100s or 1000s of nodes. It would also introduce a far greater number of opportunties for human error. Ansible makes it incredibly easy to perform these kinds of tasks. Running the bootstrap process is as easy as bash bootstrap_static.sh . This script essentially runs ansible-playbook -i inventory/static playbooks/boostrap.yml for you. This process will typically take 7-10 minutes depending on the size of the instances you selected. When the script is finished, you should see something similar to the following; PLAY RECAP ********************************************************************* localhost : ok=3 changed=2 unreachable=0 failed=0 master01 : ok=21 changed=15 unreachable=0 failed=0 master03 : ok=21 changed=15 unreachable=0 failed=0 slave01 : ok=21 changed=15 unreachable=0 failed=0 slave02 : ok=21 changed=15 unreachable=0 failed=0 slave03 : ok=21 changed=15 unreachable=0 failed=0 As you can see, all of the nodes had 21 total task performed. Of those tasks, 15 tasks required modifications to be compliant with the desire configuration state. Run hortonworks_static.sh Now that the bootstrap process is complete, we can install HDP. The hortonworks_static.sh script is all you have to run to install HDP. This script essentially runs ansible-playbook -i inventory/static playbooks/hortonworks.yml for you. The script installs the Ambari Server on the last master node in our list. In my case, the last master node is master03. The script also installs the Ambari Agent on all of the nodes. The installation of HDP is performed by submitting an request to the Ambari Server API using an Ambari Blueprint. This process will typically take 10-15 minutes depending on the size of the instances you selected, the number of master nodes and the list of HDP components you have enabled. If you forgot to install the specific version of Ansible, you will likely see something similar to the following: TASK [site facts processing] *************************************************** fatal: [localhost]: FAILED! => {"failed": true, "msg": "ERROR! The module sitefacts.py dnmemory=\"31.0126953125\" mnmemory=\"31.0126953125\" cores=\"8\" was not found in configured module paths. Additionally, core modules are missing. If this is a checkout, run 'git submodule update --init --recursive' to correct this problem."} PLAY RECAP ********************************************************************* localhost : ok=4 changed=2 unreachable=0 failed=1 master01 : ok=8 changed=0 unreachable=0 failed=0 master03 : ok=8 changed=0 unreachable=0 failed=0 slave01 : ok=8 changed=0 unreachable=0 failed=0 slave02 : ok=8 changed=0 unreachable=0 failed=0 slave03 : ok=8 changed=0 unreachable=0 failed=0 To resolve this, simply perform the pip install ansible==2.1.3.0 command within your Python virtual environment. Now you can rerun the bash hortonworks_static.sh script. The last task of the playbook is to install HDP via an Ambari Blueprint. It is normal to see something similar to the following: TASK [ambari-server : Create the cluster instance] ***************************** ok: [master03] TASK [ambari-server : Wait for the cluster to be built] ************************ FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (180 retries left). FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (179 retries left). FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (178 retries left). FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (177 retries left). FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (176 retries left). FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (175 retries left). FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (174 retries left). FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (173 retries left). FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (172 retries left). FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (171 retries left). FAILED - RETRYING: TASK: ambari-server : Wait for the cluster to be built (170 retries left). Once you see 3-5 of the retry messages, you can access the Ambari interface via your web browser. The default login is admin and the default password is admin . You should see something similar to the following: Click on the Operations icon that shows 10 operations in progress. You should see something similar to the following: The installation task each takes between 400-600 seconds. The start task each take between 20-300 seconds. The master servers typically take longer to install and star than the slave servers. When everything is running properly, you should see something similar to this: If you look back at your terminal window, you should see something similar to the following: ok: [master03] TASK [ambari-server : Fail if the cluster create task is in an error state] **** skipping: [master03] TASK [ambari-server : Change Ambari admin user password] *********************** skipping: [master03] TASK [Cleanup the temporary files] ********************************************* changed: [master03] => (item=/tmp/cluster_blueprint) changed: [master03] => (item=/tmp/cluster_template) changed: [master03] => (item=/tmp/alert_targets) ok: [master03] => (item=/tmp/hdprepo) PLAY RECAP ********************************************************************* localhost : ok=5 changed=3 unreachable=0 failed=0 master01 : ok=8 changed=0 unreachable=0 failed=0 master03 : ok=30 changed=8 unreachable=0 failed=0 slave01 : ok=8 changed=0 unreachable=0 failed=0 slave02 : ok=8 changed=0 unreachable=0 failed=0 slave03 : ok=8 changed=0 unreachable=0 failed=0 Destroy the cluster You should remember that you will incur AWS costs while the cluster is running. You can either shutdown or terminate the instances. If you want to use the cluster later, then use Ambari to stop all of the services before shutting down the instances. Review If you successfully followed along with this tutorial, you should have been able to easy deploy Hortonworks Data Platform 2.5 on AWS using the Ansible playbook. The process to deploy the cluster typicall takes 10-20 minutes. For more information on how the instance types and number of master nodes impacted the installation time, review the Ansbile + Hadoop slides I linked at the top of the article.

Online	Offline
Last Visited	‎02-08-2019 07:03 PM

Member Since	‎02-09-2016 09:44 PM
Last Visited	‎02-08-2019 07:03 PM
Posts	559
Kudos received	413

Cloudera Community

Re: How can I force the getTwitter processor to no...

Re: Send Ambari Metric to Elasticsearch

Re: Ingesting unformatted, unordered data from hdf...

Re: What would the audit record on Zeppelin users ...

Re: Automate loading data into HDFS

Re: Using Zeppelin with Spark 2.1 on HDP 2.6 clust...

Using Cloudbreak to deploy HDP 2.6 and Spark 2.1 o...

Using Cloudbreak 1.14.4 to deploy HDP 2.6

Installing Anaconda python on HDP cluster using Cl...

Re: Using a local instance of Cloudbreak with Vagr...

Re: Error starting Zeppelin Notebook in Sandbox -

Re: Why can't I upload a simple text file onto HDF...

Re: sandbox Docker container startup error

Re: ElasticSearch query to perform case-insensitiv...

Using Ansible to deploy HDP on AWS