Member since 
    
	
		
		
		02-09-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                559
            
            
                Posts
            
        
                422
            
            
                Kudos Received
            
        
                98
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2864 | 03-02-2018 01:19 AM | |
| 4580 | 03-02-2018 01:04 AM | |
| 3066 | 08-02-2017 05:40 PM | |
| 2869 | 07-17-2017 05:35 PM | |
| 2102 | 07-10-2017 02:49 PM | 
			
    
	
		
		
		05-24-2017
	
		
		06:06 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 This tutorial will walk you through the process of using Cloudbreak recipes to install TensorFlow for Anaconda Python on an HDP 2.6 cluster during cluster provisioning. We'll then update Zeppelin to use the newly install version of Anaconda and run a quick TensorFlow test. 
 Prerequisites 
 
 You should already have a Cloudbreak v1.14.4 environment running. You can follow this article to create a Cloudbreak instance using Vagrant and Virtualbox: HCC Article 
 You should already have created a blueprint that deploys HDP 2.6 with Spark 2.1. You can follow this article to get the blueprint setup. Do not create the cluster yet, as we will do that in this tutorial: HCC Article 
 You should already have credentials created in Cloudbreak for deploying on AWS (or Azure). This tutorial does not cover creating credentials. 
 
 Scope 
 This tutorial was tested in the following environment: 
 
 Cloudbreak 1.14.4 
 AWS EC2 
 HDP 2.6 
 Spark 2.1 
 Anaconda 2.7.13 
 TensorFlow 1.1.0 
 
 Steps 
 Create Recipe 
 Before you can use a recipe during a cluster deployment, you have to create the recipe. In the Cloudbreak UI, look for the  mange recipes  section. It should look similar to this: 
 
    
   
 
 If this is your first time creating a recipe, you will have  0  recipes instead of the  2  recipes show in my interface. 
 Now click on the arrow next to  manage recipes  to display available recipes. You should see something similar to this: 
 
    
   
 
 Now click on the green  create recipe  button. You should see something similar to this: 
 
    
   
 
 Now we can enter the information for our recipe. I'm calling this recipe  tensorflow . I'm giving it the description of  Install TensorFlow Python . You can choose to run the script as either  pre-install  or  post-install . I'm choosing to do the install  post-install . This means the script will be run after the Ambari installation process has started. So choose the  Execution Type  of  POST . The script is fairly basic. We are going to download the Anaconda install script, then run it in silent mode. Then we'll use the Anaconda version of  pip  to install TensorFlow. Here is the script: 
 #!/bin/bash
wget https://repo.continuum.io/archive/Anaconda2-4.3.1-Linux-x86_64.sh
bash ./Anaconda2-4.3.1-Linux-x86_64.sh -b -p /opt/anaconda
/opt/anaconda/bin/pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp27-none-linux_x86_64.whl 
 You can read more about installing TensorFlow on Anaconda here: TensorFlow Docs. 
 When you have finished entering all of the information, you should see something similar to this: 
 
    
   
 
 If everything looks good, click on the green  create recipe  button. 
 You should be able to see the recipe in your list of recipes: 
 
    
   
 
 NOTE: You will most likely have a different list of recipes. 
 Create a Cluster using a Recipe 
 Now that our recipe has been created, we can create a cluster that uses the recipe. Go through the process of creating a cluster up to the  Choose Blueprint  step. This step is where you select the recipe you want to use. The recipes are not selected by default; you have to select the recipes you wish to use. You can specify recipes for 1 or more host groups. This allows you to run different recipes across different host groups (masters, slaves, etc). You can also select multiple recipes. 
 We want to use the  hdp26-spark-21-cluster  blueprint. This will create an HDP 2.6 cluster with Spark 2.1 and Zeppelin. You should have created this blueprint when you followed the prerequisite tutorial. You should see something similar to this: 
 
    
   
 
 In our case, we are going to run the  tensorflow  recipe on every host group. If you intend to use something like TensorFlow across the cluster, you should install it on at least the slave nodes and the client nodes. 
 After you have selected the recipe for the host groups, click the  Review & Launch  button, then launch the cluster. As the cluster is building, you should see a message in the Cloudbreak UI that indicates the recipe is running. When that happens, you will see something similar to this: 
 
    
   
 
 If you click on the building cluster, you can see more detailed information. You should see something similar to this: 
 
    
   
 
 Once the cluster has finished building, you should see something similar to this: 
 
    
   
 
 Cloudbreak will create logs for each recipe that runs on each host. These logs are located at  /var/log/recipe  and have the name of the recipe and whether it is pre or post install. For example, our recipe log is called  post-tensorflow.log . You can tail this log file to following the execution of the script. 
 NOTE: Post install scripts won't be executed until the Ambari server is installed and the cluster is building. You can always monitor the  /var/log/recipe  directory on a node to see when the script is being executed. The time it takes to run the script will vary depending on the cloud environment and how long it takes to spin up the cluster. 
 On your cluster, you should be able to see the post-install log: 
 $ ls /var/log/recipes
post-tensorflow.log  post-hdfs-home.log 
 Verify Anaconda Install 
 Once the install process is complete, you should be able to verify that Anaconda is installed. You need to  ssh  into one of the cloud instances. You can get the public ip address from the Cloudbreak UI. You will login using the corresponding private key to the public key you entered when you created the Cloudbreak credential. You should login as the  cloudbreak  user. You should see something similar to this: 
 $ ssh -i ~/Downloads/keys/cloudbreak_id_rsa cloudbreak@#.#.#.#
The authenticity of host '#.#.#.# (#.#.#.#)' can't be established.
ECDSA key fingerprint is SHA256:By1MJ2sYGB/ymA8jKBIfam1eRkDS5+DX1THA+gs8sdU.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '#.#.#.#' (ECDSA) to the list of known hosts.
Last login: Sat May 13 00:47:41 2017 from 192.175.27.2
       __|  __|_  )
       _|  (     /   Amazon Linux AMI
      ___|\___|___|
https://aws.amazon.com/amazon-linux-ami/2016.09-release-notes/
25 package(s) needed for security, out of 61 available
Run "sudo yum update" to apply all updates.
Amazon Linux version 2017.03 is available. 
 Once you are on the server, you can check the version of Python: 
 $ /opt/anaconda/bin/python --version
Python 2.7.13 :: Anaconda 4.3.1 (64-bit)
 
 Update Zeppelin Interpreter 
 We need to update the default  spark2  interpreter configuration in Zeppelin. We need to access the Zeppelin UI from Ambari. You can login to Ambari for the new cluster from the Cloudbreak UI cluster details page. Once you login to Ambari, you can access the Zeppelin UI from the Ambari Quicklink. You should see something similar to this: 
 
    
   
 
 After you access the Zeppelin UI, click the blue  login  button in the upper right corner of the interface. You can login using the default username and password of  admin . After you login to Zeppelin, click the  admin  button in the upper right corner of the interface. This will expose the options menu. You should see something similar to this: 
 
    
   
 
 Click on the  Interpreter  link in the menu. This will display all of the configured interpreters. Find the  spark2  interpreter. You can see the default setting for  zeppelin.pyspark.python  is set to  python . This will use whichever Python is found in the path. You should see something similar to this: 
 
    
   
 
 We will need to change this to  /opt/anaconda/bin/python  which is where we have Anaconda Python installed. Click on the  edit  button and change  zeppelin.pyspark.python  to  /opt/anaconda/bin/python . You should see something similar to this: 
 
    
   
 
 Now we can click the blue  save  button at the bottom. The configuration changes are now saved, but we need to restart the interpreter for the changes to take affect. Click on the  restart  button to restart the interpreter. 
 Create Zeppelin Notebook 
 Now that our  spark2  interpreter configuration has been updated, we can create a notebook to test Anaconda + TensorFlow. Click on the  Notebook  menu. You should see something similar to this: 
 
    
   
 
 Click on the  Create new note  link. You can give the notebook any descriptive name you like. Select  spark2  as the default interpreter. You should see something similar to this: 
 
    
   
 
 Your notebook will start with a blank paragraph. For the first paragraph, let's test the version of Spark we are using. Enter the following in the first paragraph: 
 %spark2.pyspark
sc.version 
 Now click the run button for the paragraph. You should see something similar to this: 
 u'2.1.0.2.6.0.3-8' 
 As you can see, we are using Spark 2.1 Now in the second paragraph, we'll test the version of Python. We already know the command line verison is 2.7.13. Enter the following in the second paragraph: 
 %spark2.pyspark
import sys
print sys.version_info 
 Now click the run button for the paragraph. You should see something similar to this: 
 sys.version_info(major=2, minor=7, micro=13, releaselevel='final', serial=0) 
 As you can see, we are runnig Python version 2.7.13. 
 Now we can test TensorFlow. Enter the following in the third paragraph: 
 %spark2.pyspark
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
a = tf.constant(10)
b = tf.constant(32)
print(sess.run(a + b)) 
 This simple code comes from the TensorFlow website: [TensorFlow] (https://www.tensorflow.org/versions/r0.10/get_started/os_setup#anaconda_installation). Now click the run button for the paragraph. You may see some warning messages the first time you run it, but you should also see the following output: 
 Hello, TensorFlow!
42 
 As you can see, TensorFlow is working from Zeppelin which is using Spark 2.1 and Anaconda. If everything works properly, your notebook should look something similar this: 
 
    
   
 
 Admittedly this example is very basic, but it demonstrates the components are working together. For next steps, try running other TensorFlow code. Here are some examples you can work with: GitHub. 
 Review 
 If you have successfully followed along with this tutorial, you should have deployed an HDP 2.6 cluster in the cloud with Anaconda installed under  /opt/anaconda  and added the TensorFlow Python modules using a Cloudbreak recipe. You should have created a Zeppelin notebook which uses Anaconda Python, Spark 2.1 and TensorFlow. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		05-23-2017
	
		
		11:16 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 This tutorial is part two of a two-part series. In this tutorial, we'll verify Spark 2.1 functionality using Zeppelin on an HDP 2.6 cluster deployed using Cloudbreak. The first tutorial covers using Cloudbreak to deploy the cluster. You can find the first tutorial here: HCC Article 
 Prerequisites 
 
 You should already have completed part one of this tutorial series and already have an Cloudbreak HDP 2.6 with Spark 2.1 cluster running. 
 
 Scope 
 This tutorial was tested in the following environment: 
 
 Cloudbreak 1.14.4 
 AWS EC2 
 HDP 2.6 
 Spark 2.1 
 Zeppelin 0.7 
 
 Steps 
 Login into Ambari 
 As mentioned in the prerequisites, you should already have a cluster built using Cloudbreak. Click on the cluster summary box in the Cloudbreak UI to display the cluster details. Now click on the link to your Ambari cluster. You may see something similar to this: 
 
    
 
 Your screen may vary depending on your browser of choice. I'm using Chrome. This warning is because we are using self-signed certificates which are not trusted. Click on the  ADVANCED  link. You should see something similar to this: 
 
    
 
 Click on the  Proceed  link to open the Ambari login screen. You should be able to login to Ambari using the username and password  admin . 
 Login to Zeppelin 
 Now click on the Zeppelin component in the component status summary. You should see something similar to this: 
 
    
 
 Click on the  Quicklinks  link. You should see something similar to this: 
 
    
 
 Click on the  Zeppelin UI  link. This will load Zeppelin in a new browser tab. You should see something similar to this: 
 
    
 
 You should notice the blue  Login  button in the upper right corner of the Zeppelin UI. Click on this button. You should see something similar to this: 
 
    
 
 You should be able to login to Zeppelin using the username and password  admin . Once you login, you should see something similar to this: 
 
    
 
 Load Getting Started Notebook 
 Now let's load the  Apache Spark in 5 Minutes  notebook by clicking on the  Getting Started  link. You should see something similar to this: 
 
    
 
 Click on the  Apache Spark in 5 Minutes  notebook. You should see something similar to this: 
 
    
 
 This is showing you the Zeppelin interpreters associated with this notebook. As you can see, the  spark2  and  livy2  interpreters are enabled. Click the blue  Save  button. You should see something similar to this: 
 
    
 
 This notebook defaults to using the Spark 2.x interpreter. You should be able to run the paragraphs without any changes. Scroll down the the notebook paragraph called  Verify Spark Version . Click the play button on this paragraph. You should see something similar to this: 
 
    
 
 You should notice the Spark version is  2.1.0.2.6.0.3-8 . This confirms we are using Spark 2.1. It also confirms that Zeppelin is able to properly interact with Spark 2 on our HDP 2.6 cluster built with Cloudbreak. Try running the next two paragraphs. These paragraphs download a json file form github and then moves it to HDFS on our cluster. Now run the  Load data into a Spark DataFrame  paragraph. You should see something similar to this: 
 
    
 
 As you can see, the DataFrame should be properly loaded from the json file. 
 Next Steps 
 Try running the remaining paragraphs to ensure everything is working ok. For an extra challenge, try running some of the other Spark 2 notebooks that are included. You can also attempt to modify the Spark 1.6 notebooks to work with Spark 2.1. 
 
    
 
 Review 
 If you have successfully followed along with this tutorial, you should have been able to confirm Spark 2.1 works on our HDP 2.6 cluster deployed with Cloudbreak. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-23-2017
	
		
		09:41 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 This tutorial will walk you through the process of using Cloudbreak to deploy an HDP 2.6 cluster with Spark 2.1. We'll copy and edit the existing  hdp-spark-cluster  blueprint which deploys Spark 1.6 to create a new blueprint which installs Spark 2.1. This tutorial is part one of a two-part series. The second tutorial walks you through using Zeppelin to verify the Spark 2.1 installation. You can find that tutorial here: HCC Article 
 Prerequisites 
 
 You should already have a Cloudbreak v1.14.0 environment running. You can follow this article to create a Cloudbreak instance using Vagrant and Virtualbox: HCC Article 
 You should already have updated Cloudbreak to support deploying HDP 2.6 clusters. You can follow this article to enable that functionality: HCC Article 
 
 Scope 
 This tutorial was tested in the following environment: 
 
 Cloudbreak 1.14.4 
 AWS EC2 
 HDP 2.6 
 Spark 2.1 
 
 Steps 
 Create Blueprint 
 Before we can deploy a Spark 2.1 cluster using Cloudbreak, we need to create a blueprint that specifies Spark 2.1. Cloudbreak ships with 3 blueprints out of the box: 
 
 hdp-small-default: basic HDP cluster with Hive and HBase 
 hdp-spark-cluster: basic HDP cluster with Spark 1.6 
 hdp-streaming-cluster: basic HDP cluster with Kafka and Storm 
 
 We will use the  hdp-spark-cluster  as our base blueprint and edit it to deploy Spark 2.1 instead of Spark 1.6. 
 Click on the  manage blueprints  section of the UI. Click on the  hdp-spark-cluster  blueprint. You should see something similar to this: 
 
    
 
 Click on the blue  copy & edit  button. You should see something similar to this: 
 
    
 
 For the  Name , enter  hdp26-spark21-cluster . This tells us the blueprint is for an HDP 2.6 cluster using Spark 2.1. Enter the same information for the  Description . You should see something similar to this: 
 
    
 
 Now, we need to edit the JSON portion of the blueprint. We need to change the Spark 1.6 components to Spark 2.1 components. We don't need change where they are deployed. The following entries within the JSON are for Spark 1.6: 
 
  "name": "SPARK_CLIENT"  
  "name": "SPARK_JOBHISTORYSERVER"  
  "name": "SPARK_CLIENT"  
 
 We will replace  SPARK  with  SPARK2 . These entries should look as follows: 
 
  "name": "SPARK2_CLIENT"  
  "name": "SPARK2_JOBHISTORYSERVER"  
  "name": "SPARK2_CLIENT"  
 
 NOTE: There are two entries for SPARK_CLIENT. Make sure you change both. 
 We are going to add an entry for the  LIVY  component. We will add it to the same node as the  SPARK_JOBHISTORYSERVER . We are also going to add an entry for the  SPARK2_THRIFTSERVER  component. We will add it to the same node as the  SPARK_JOBHISTORYSERVER . Let's add those two entries just below  SPARK2_CLIENT  in the  host_group_master_2  section. 
 Change the following: 
                 {
                    "name": "SPARK2_JOBHISTORYSERVER"
                },
                {
                    "name": "SPARK2_CLIENT"
                }, 
 to this: 
                 {
                    "name": "SPARK2_JOBHISTORYSERVER"
                },
                {
                    "name": "SPARK2_CLIENT"
                },
                {
                    "name": "SPARK2_THRIFTSERVER"
                },
                {
                    "name": "LIVY2_SERVER"
                }, 
 We also need to update the  blueprint_name  to  hdp26-spark21-cluster  and the  stack_version  to  2.6 . you should have something similar to this: 
     "Blueprints": {
        "blueprint_name": "hdp26-spark21-cluster",
        "stack_name": "HDP",
        "stack_version": "2.6"
    } 
 If you prefer, you can copy and paste the following blueprint JSON: 
 {
    "host_groups": [
        {
            "name": "host_group_client_1",
            "configurations": [],
            "components": [
                {
                    "name": "ZOOKEEPER_CLIENT"
                },
                {
                    "name": "PIG"
                },
                {
                    "name": "OOZIE_CLIENT"
                },
                {
                    "name": "HBASE_CLIENT"
                },
                {
                    "name": "HCAT"
                },
                {
                    "name": "KNOX_GATEWAY"
                },
                {
                    "name": "METRICS_MONITOR"
                },
                {
                    "name": "FALCON_CLIENT"
                },
                {
                    "name": "TEZ_CLIENT"
                },
                {
                    "name": "SPARK2_CLIENT"
                },
                {
                    "name": "SLIDER"
                },
                {
                    "name": "SQOOP"
                },
                {
                    "name": "HDFS_CLIENT"
                },
                {
                    "name": "HIVE_CLIENT"
                },
                {
                    "name": "YARN_CLIENT"
                },
                {
                    "name": "METRICS_COLLECTOR"
                },
                {
                    "name": "MAPREDUCE2_CLIENT"
                }
            ],
            "cardinality": "1"
        },
        {
            "name": "host_group_master_3",
            "configurations": [],
            "components": [
                {
                    "name": "ZOOKEEPER_SERVER"
                },
                {
                    "name": "APP_TIMELINE_SERVER"
                },
                {
                    "name": "TEZ_CLIENT"
                },
                {
                    "name": "HBASE_MASTER"
                },
                {
                    "name": "HBASE_CLIENT"
                },
                {
                    "name": "HDFS_CLIENT"
                },
                {
                    "name": "METRICS_MONITOR"
                },
                {
                    "name": "SECONDARY_NAMENODE"
                }
            ],
            "cardinality": "1"
        },
        {
            "name": "host_group_slave_1",
            "configurations": [],
            "components": [
                {
                    "name": "HBASE_REGIONSERVER"
                },
                {
                    "name": "NODEMANAGER"
                },
                {
                    "name": "METRICS_MONITOR"
                },
                {
                    "name": "DATANODE"
                }
            ],
            "cardinality": "6"
        },
        {
            "name": "host_group_master_2",
            "configurations": [],
            "components": [
                {
                    "name": "ZOOKEEPER_SERVER"
                },
                {
                    "name": "ZOOKEEPER_CLIENT"
                },
                {
                    "name": "PIG"
                },
                {
                    "name": "MYSQL_SERVER"
                },
                {
                    "name": "HIVE_SERVER"
                },
                {
                    "name": "METRICS_MONITOR"
                },
                {
                    "name": "SPARK2_JOBHISTORYSERVER"
                },
                {
                    "name": "SPARK2_CLIENT"
                },
                {
                    "name": "SPARK2_THRIFTSERVER"
                },
                {
                    "name": "LIVY2_SERVER"
                },
                {
                    "name": "TEZ_CLIENT"
                },
                {
                    "name": "HBASE_CLIENT"
                },
                {
                    "name": "HIVE_METASTORE"
                },
                {
                    "name": "ZEPPELIN_MASTER"
                },
                {
                    "name": "HDFS_CLIENT"
                },
                {
                    "name": "YARN_CLIENT"
                },
                {
                    "name": "MAPREDUCE2_CLIENT"
                },
                {
                    "name": "RESOURCEMANAGER"
                },
                {
                    "name": "WEBHCAT_SERVER"
                }
            ],
            "cardinality": "1"
        },
        {
            "name": "host_group_master_1",
            "configurations": [],
            "components": [
                {
                    "name": "ZOOKEEPER_SERVER"
                },
                {
                    "name": "HISTORYSERVER"
                },
                {
                    "name": "OOZIE_CLIENT"
                },
                {
                    "name": "NAMENODE"
                },
                {
                    "name": "OOZIE_SERVER"
                },
                {
                    "name": "HDFS_CLIENT"
                },
                {
                    "name": "YARN_CLIENT"
                },
                {
                    "name": "FALCON_SERVER"
                },
                {
                    "name": "METRICS_MONITOR"
                },
                {
                    "name": "MAPREDUCE2_CLIENT"
                }
            ],
            "cardinality": "1"
        }
    ],
    "Blueprints": {
        "blueprint_name": "hdp26-spark21-cluster",
        "stack_name": "HDP",
        "stack_version": "2.6"
    }
} 
 Once you have all of the changes in place, click the green  create blueprint  button. 
 Create Security Group 
 We need to create a new security group to use with our cluster. By default, the existing security groups only allow ports 22, 443, and 9443. As part of this tutorial, we will use Zeppelin to test Spark 2.1. We'll create a new security group that opens all ports to our IP address. 
 Click on the  manage security groups  section of the UI. You should see something similar to this: 
 
    
 
 Click on the green  create security group  button. You should see something similar to this: 
 
    
 
 First you need to select the appropriate cloud platform. I'm using AWS, so that is what I selected. We need to provide a unique name for our security group. I used  all-ports-my-ip . You should use something descriptive. Provide a helpful description as well. Now we need to enter our personal IP address CIDR. I am using  #.#.#.#/32 ; your IP address will obviously be different. You need to enter the port range. There is a known issue in Cloudbreak that prevents you from using  0-65356 , so we'll use  1-65356 . For the protocol, use  tcp . Once you have everything entered, you should see something similar to this: 
 
    
 
 Click the green  Add Rule  button to add this rule to our security group. You can add multiple rules, but we have everything covered with our single rule. You should see something similar to this: 
 
    
 
 If everything looks good, click the green  create security group  button. This will create our new security group. You should see something like this: 
 
    
 
 Create Cluster 
 Now that our blueprint has been created and we have an new security group, we can begin building the cluster. Ensure you have selected the appropriate credential for your cloud environment. Then click the green  create cluster  button. You should see something similar to this: 
 
    
 
 Give your cluster a descriptive name. I used  spark21test , but you can use whatever you like. Select an appropriate cloud region. I'm using AWS and selected  US East (N. Virginia) , but you can use whatever you like. You should see something similar to this: 
 
    
 
 Click on the  Setup Network and Security  button. You should see something similar to this: 
 
    
 
 We are going to keep the default options here. Click on the  Choose Blueprint  button. You should see something similar to this: 
 
    
 
 Expand the blueprint dropdown menu. You should see the blueprint we created before,  hdp26-spark21-cluster . Select the blueprint. You should see something similar to this: 
 
    
 
 You should notice the new security group is already selected. Cloudbreak did not automatically figure this out. The instance templates and security groups are selected alphabetically be default. 
 Now we need to select a node on which to deploy Ambari. I typically deploy Ambari on the  master1  server. Check the Ambari check box on one of the master servers. If everything looks good, click on the green  create cluster , You should see something similar to this: 
 
    
 
 Once the cluster has finished building, you can click on the arrow for the cluster we created to get expanded details. You should see something similar to this: 
 
    
 
 Verify Versions 
 Once the cluster is fully deployed, we can verify the versions of the components. Click on the Ambari link on the cluster details page. Once you login to Ambari, you should see something similar to this: 
 
    
 
 You should notice that Spark2 is shown in the component list. Click on Spark2 in the list. You should see something similar to this: 
 
    
 
 You should notice that both the Spark2 Thrift Server and the Livy2 Server have been installed. Now lets check the overall cluster verions. Click on the  Admin  link in the Ambari menu and select  Stacks and Versions . Then click on the  Versions  tab. You should see something similar to this: 
 
    
 
 As you can see, HDP 2.6.0.3 was deployed. 
 Review 
 If you have successfully followed along with this tutorial, you should know how to create a new security group and blueprint. The blueprint allows you to deploy HDP 2.6 with Spark 2.1. The security group allows you to access all ports on the cluster from your IP address. Follow along in part 2 of the tutorial series to use Zeppelin to test Spark 2.1. 
    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-18-2017
	
		
		03:00 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		6 Kudos
		
	
				
		
	
		
					
							 Prerequisites 
 
 You should already have a Cloudbreak v1.14.4 environment running. You can follow this article to create a Cloudbreak instance using Vagrant and Virtualbox: HCC Article 
 You should already have credentials created in Cloudbreak for deploying on AWS (or Azure). 
 
 Scope 
 This tutorial was tested in the following environment: 
 
 macOS Sierra (version 10.12.4) 
 Cloudbreak 1.14.4 
 AWS EC2 
 
 NOTE: Cloudbreak 1.14.0 (TP) had a bug which caused HDP 2.6 clusters installs to fail. You should upgrade your Cloudbreak deployer instance to 1.14.4. 
 Steps 
 Create application.yml file 
 UPDATE 05/24/2017: The creation of a custom application.yml file is not required with Cloudbreak 1.14.4. This version of Cloudbreak includes support for HDP 2.5 and HDP 2.6. This step remains for educational purposes for future HDP updates. 
 You need to create an application.yml file in the  etc  directory within your Cloudbreak deployment directory. This file will contain the repo information for HDP 2.6. If you followed my tutorial linked above, then your Cloudbreak deployment directory should be  /opt/cloudbreak-deployment . If you are using a Cloudbreak instance on AWS or Azure, then your Cloudbreak deployment directory is likely  /var/lib/cloudbreak-deployment/ . 
 Edit your  <cloudbreak-deployment>/etc/application.yml  file using your favorite editor. Copy and paste the following in the file: 
 cb:
  ambari:
    repo:
      version: 2.5.0.3-7
      baseurl: http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.5.0.3
      gpgkey: http://public-repo-1.hortonworks.com/ambari/centos6/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
    database:
      vendor: embedded
      host: localhost
      port: 5432
      name: postgres
      username: ambari
      password: bigdata
</p>
<p>
   hdp:
    entries:
      2.5:
        version: 2.5.0.1-210
        repoid: HDP-2.5
        repo:
          stack:
            repoid: HDP-2.5
            redhat6: http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.5.5.0
            redhat7: http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.5.5.0
          util:
            repoid: HDP-UTILS-1.1.0.21
            redhat6: http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos6
            redhat7: http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos7
      2.6:
        version: 2.6.0.0-598
        repoid: HDP-2.6
        repo:
          stack:
            repoid: HDP-2.6
            redhat6: http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.6.0.3
            redhat7: http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.6.0.3
          util:
            repoid: HDP-UTILS-1.1.0.21
            redhat6: http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos6
            redhat7: http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos7
 
 Start Cloudbreak 
 Once you have created your  application.yml  file, you can start Cloudbreak. 
 $ cbd start
 
 NOTE: It may take a couple of minutes before Cloudbreak is fully running. 
 Create HDP 2.6 Blueprint 
 To create an HDP 2.6 cluster, we need to update our blueprint to specify HDP 2.6. On the main Cloudbreak UI, click on  manage blueprints . You should see something similar to this: 
 
    
 
 You should see 3 default blueprints. We are going to use the  hdp-small-default  blueprint as our base. Click on the  hdp-small-default  blueprint name. You should see something similar to this: 
 
    
 
 Now click on the blue  copy & edit  button. You should see something similar to this: 
 
    
 
 For the  Name , you should enter something unqiue and descriptive. I suggest  hdp26-small-default . For the  Description , you can enter the same information. You should see something similar to this: 
 
    
 
 Now we need to edit the JSON portion of the blueprint. Scroll down to the bottom of the JSON. You should see something similar to this: 
 
    
 
 Now edit the  blueprint_name  value to be  hdp26-small-default  and edit the  stack_version  to be  2.6 . You should see something similar to this: 
 
    
 
 Now click on the green  create blueprint  button. You should see the new blueprint visible in the list of blueprints. 
 Create HDP 2.6 Small Default Cluster 
 Now that our blueprint has been created, we can create a cluster and select this blueprint to install HDP 2.6. Select the appropriate credential for your Cloud environment. Click on the  create cluster  button. You should see something similar to this: 
 
    
 
 Provide a unique, but descriptive  Cluster Name . Ensure you select an appropriate  Region . I chose  hdp26test  as my cluster name and I'm using the  US East  region: 
 
    
 
 Now advanced to the next step by clicking on  Setup Network and Security . You should see something similar to this: 
 
    
 
 We don't need to make any changes here, so click on the  Choose Blueprint  button. You should see something similar to this: 
 
    
 
 In the  Blueprint  dropdown, you should see the blueprint we created. Select the  hdp26-small-default  blueprint. You should see something similar to this: 
 
    
 
 You need to select which node Ambari will run on. I typically select the master1 node. You should see something similar to this: 
 
    
 
 Now you can click on the  Review and Launch  button. You should see something similar to this: 
 
    
 
 Verify the information presented. If everything looks good, click on the  create and start cluster button . Once the cluster build process has started, you should see something similar to this: 
 
    
 
 Verify HDP Version 
 Once the cluster has finished building, you can click on the cluster in the Cloudbreak UI. You should see something similar to this: 
 
    
 
 Click on the Ambari link to load Ambari. Login using the default username and password of  admin . Now click on the  Admin  link in the menu. You should see something similar to this: 
 
    
 
 Click on the  Stack and Versions  link. You should see something similar to this: 
 
    
 
 You should notice that  HDP 2.6.0.3  has been deployed. 
 Review 
 If you have successfully followed along with this tutorial, you should know how to create/update /etc/application.yml to add specific Ambair and HDP repositories. You should have successfully created an updated blueprint and deployed HDP 2.6 on your cloud of choice. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-13-2017
	
		
		01:57 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		5 Kudos
		
	
				
		
	
		
					
							 Objectives 
 This tutorial will walk you through the process of using Cloudbreak recipes to install Anaconda on an your HDP cluster during cluster provisioning. This process can be used to automate many tasks on the cluster both pre-install and post-install. 
 Prerequisites 
 
 You should already have a Cloudbreak v1.14.0 environment running. You can follow this article to create a Cloudbreak instance using Vagrant and Virtualbox: HCC Article 
 You should already have credentials created in Cloudbreak for deploying on AWS (or Azure). 
 
 Scope 
 This tutorial was tested in the following environment: 
 
 macOS Sierra (version 10.12.4) 
 Cloudbreak 1.14.0 TP 
 AWS EC2 
 Anaconda 2.7.13 
 
 Steps 
 Create Recipe 
 Before you can use a recipe during a cluster deployment, you have to create the recipe. In the Cloudbreak UI, look for the "mange recipes" section. It should look similar to this: 
 
    
 
 If this is your first time creating a recipe, you will have  0  recipes instead of the  2  recipes show in my interface. 
 Now click on the arrow next to  manage recipes  to display available recipes. You should see something similar to this: 
 
    
 
 Now click on the green  create recipe  button. You should see something similar to this: 
 
    
 
 Now we can enter the information for our recipe. I'm calling this recipe  anaconda . I'm giving it the description of  Install Anaconda . You can choose to install Anaconda as either pre-install or post-install. I'm choosing to do the install post-install. This means the script will be run after the Ambari installation process has started. So choose the  Execution Type  of  POST . Choose  Script  so we can copy and paste the shell script. You can also specify a file to upload or a URL (gist for example). Our script is very basic. We are going to download the Anaconda install script, then run it in silent mode. Here is the script: 
 #!/bin/bash
wget https://repo.continuum.io/archive/Anaconda2-4.3.1-Linux-x86_64.sh
bash ./Anaconda2-4.3.1-Linux-x86_64.sh -b -p /opt/anaconda
 
 When you have finished entering all of the information, you should see something similar to this: 
 
    
 
 If everything looks good, click on the green  create recipe  button. 
 After the recipe has been created, you should see something similar to this: 
 
    
 
 Create a Cluster using a Recipe 
 Now that our recipe has been created, we can create a cluster that uses the recipe. Go through the process of creating a cluster up to the  Choose Blueprint  step. This step is when you select the recipe you want to use. The recipes are not selected by default; you have to select the recipes you wish to use. You specify recipes for 1 or more host groups. This allows you to run different recipes across different host groups (masters, slaves, etc). You can also select multiple recipes. 
 We want to use the ```hdp-small-default``` blueprint. This will create a basic HDP cluster. 
 If you select the  anaconda  recipe, you should see something similar to this: 
 [Select Recipe]( ) In our case, we are going to run the recipe on every host group. If you intend to use something like Anaconda across the cluster, you should install it on at least the slave nodes and the client nodes. 
 After you have selected the recipe for the host groups, click the  Review & Launch  button, then launch the cluster. As the cluster is building, you should see a message in the Cloudbreak UI that indicates the recipe is running. When that happens, you will see something similar to this: 
 
    
 
 Cloudbreak will create logs for each recipe that runs on each host. These logs are located at  /var/log/recipe  and have the name of the recipe and whether it is pre or post install. For example, our recipe log is called  post-anaconda.log . You can tail this log file to following the execution of the script. 
 NOTE: Post install scripts won't be executed until the Ambari server is installed and the cluster is building. You can always monitor the  /var/log/recipe  directory on a node to see when the script is being executed. The time it takes to run the script will vary depending on the cloud environment and how long it takes to spin up the cluster. 
 On your cluster, you should be able to see the post-install log: 
 $ ls /var/log/recipes
post-anaconda.log  post-hdfs-home.log
 
 Once the install process is complete, you should be able to verify that Anaconda is installed. You need to  ssh  into one of the cloud instances. You can get the public ip address from the Cloudbreak UI. You will login using the corresponding private key to the public key you entered when you created the Cloudbreak credential. You should login as the  cloudbreak  user. You should see something similar to this: 
 $ ssh -i ~/Downloads/keys/cloudbreak_id_rsa cloudbreak@#.#.#.#
The authenticity of host '#.#.#.# (#.#.#.#)' can't be established.
ECDSA key fingerprint is SHA256:By1MJ2sYGB/ymA8jKBIfam1eRkDS5+DX1THA+gs8sdU.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '#.#.#.#' (ECDSA) to the list of known hosts.
Last login: Sat May 13 00:47:41 2017 from 192.175.27.2
       __|  __|_  )
       _|  (     /   Amazon Linux AMI
      ___|\___|___|
https://aws.amazon.com/amazon-linux-ami/2016.09-release-notes/
25 package(s) needed for security, out of 61 available
Run "sudo yum update" to apply all updates.
Amazon Linux version 2017.03 is available.
 
 Once you are on the server, you can check the version of python: 
 $ /opt/anaconda/bin/python --version
Python 2.7.13 :: Anaconda 4.3.1 (64-bit)
 
 Review 
 If you have successfully followed along with this tutorial, you should know how to create pre and post install scripts. You should have successfully deployed a cluster on either AWS or Azure with Anaconda installed under  /opt/anaconda  on the nodes you specified. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		05-12-2017
	
		
		10:17 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		17 Kudos
		
	
				
		
	
		
					
							 Note: A newer version of this article is available here: https://community.hortonworks.com/articles/194076/using-vagrant-and-virtualbox-to-create-a-local-ins.html  Objectives  
 This tutorial is designed to walk you through the process of using Vagrant and Virtualbox to create a local instance of Cloudbreak. This will allow you to start your Cloudbreak deployer when you want to spin up an HDP cluster on the cloud without incurring costs associated with hosting your Cloudbreak instance on the cloud itself.  Prerequisites  
  You should already have installed VirtualBox 5.1.x. Read more here: VirtualBox   You should already have installed Vagrant 1.9.x. Read more here: Vagrant   You should already have installed the vagrant-vbguest plugin. This plugin will keep the VirtualBox Guest Additions software current as you upgrade your kernel and/or VirtualBox versions. Read more here: vagrant-vbguest   You should already have installed the vagrant-hostmanager plugin. This plugin will automatically manage the /etc/hosts file on your local computer and in your virtual machines. Read more here: vagrant-hostmanager   Scope  
 This tutorial was tested in the following environment:  
  macOS Sierra (version 10.12.4)   VirtualBox 5.1.22   Vagrant 1.9.4   vagrant-vbguest plugin 0.14.1   vagrant-hostnamanger plugin 1.8.6   Cloudbreak 1.14.0 TP   Steps  Setup Vagrant  Create Vagrant project directory  
 Before we get started, determine where you want to keep your Vagrant project files. Each Vagrant project should have its own directory. I keep my Vagrant projects in my ~/Development/Vagrant directory. You should also use a helpful name for each Vagrant project directory you create. 
 $ cd ~/Development/Vagrant
$ mkdir centos7-cloudbreak
$ cd centos7-cloudbreak
  
 We will be using a CentOS 7.3 Vagrant box, so I include centos7 in the Vagrant project name to differentiate it from a CentOS 6 project. The project is for cloudbreak, so I include that in the name.  Create Vagrantfile  
 The Vagrantfile tells Vagrant how to configure your virtual machines. You can copy/paste my Vagrantfile below: 
 # -*- mode: ruby -*-
# vi: set ft=ruby :
# Using yaml to load external configuration files
require 'yaml'
Vagrant.configure("2") do |config|
  # Using the hostmanager vagrant plugin to update the host files
  config.hostmanager.enabled = true
  config.hostmanager.manage_host = true
  config.hostmanager.manage_guest = true
  config.hostmanager.ignore_private_ip = false
  # Loading in the list of commands that should be run when the VM is provisioned.
  commands = YAML.load_file('commands.yaml')
  commands.each do |command|
    config.vm.provision :shell, inline: command
  end
  # Loading in the VM configuration information
  servers = YAML.load_file('servers.yaml')
  servers.each do |servers| 
    config.vm.define servers["name"] do |srv|
      srv.vm.box = servers["box"] # Speciy the name of the Vagrant box file to use
      srv.vm.hostname = servers["name"] # Set the hostname of the VM
      srv.vm.network "private_network", ip: servers["ip"], :adapater=>2 # Add a second adapater with a specified IP
      srv.vm.provision :shell, inline: "sed -i'' '/^127.0.0.1\t#{srv.vm.hostname}\t#{srv.vm.hostname}$/d' /etc/hosts" # Remove the extraneous first entry in /etc/hosts
      srv.vm.provider :virtualbox do |vb|
        vb.name = servers["name"] # Name of the VM in VirtualBox
        vb.cpus = servers["cpus"] # How many CPUs to allocate to the VM
        vb.memory = servers["ram"] # How much memory to allocate to the VM
      end
    end
  end
end
  Create a servers.yaml file  
 The servers.yaml file contains the configuration information for our VMs. Here is the content from my file: 
 ---
- name: cloudbreak
  box: bento/centos-7.3
  cpus: 2
  ram: 4096
  ip: 192.168.56.100
  
 NOTE: You may need to modify the IP address to avoid conflicts with your local network.  Create commands.yaml file  
 The commands.yaml file contains the list of commands that should be run on each VM when they are first provisioned. This allows us to automate configuration tasks that would other wise be tedious and/or repetitive. Here is the content from my file: 
 - "sudo yum -y update"
- "sudo yum -y install net-tools ntp wget lsof unzip tar iptables-services"
- "sudo systemctl enable ntpd && sudo systemctl start ntpd"
- "sudo systemctl disable firewalld && sudo systemctl stop firewalld"
- "sudo iptables --flush INPUT && sudo iptables --flush FORWARD && sudo service iptables save"
- "sudo sed -i --follow-symlinks 's/^SELINUX=.*/SELINUX=disabled/g' /etc/sysconfig/selinux"
  Start Virtual Machines  
 Once you have created the 3 files in your Vagrant project directory, you are ready to start your cluster. Creating the cluster for the first time and starting it every time after that uses the same command: 
 $ vagrant up
  
 You should notice Vagrant automatically updating the packages on the VM.  
 Once the process is complete you should have 1 servers running. You can verify by looking at the Virtualbox UI where you should see the cloudbreak VM running. You should see something similar to this: 
     
    Connect to each virtual machine  
 You are able to login to the VM via ssh using the vagrant ssh command. 
 $ vagrant ssh
[vagrant@cloudbreak ~]$
  Install Cloudbreak  
 Most of the Cloudbreak installation is covered well in the docs: 
 Cloudbreak Install Docs. However, the first couple of steps in the docs has you install a few packages, change iptables settings, etc. That part of the install is actually handled by the Vagrant provisioning step, so you can skip those steps. You should be able to start at the Docker Service section of the docs.  
 We need to be root for most of this, so we'll use sudo. 
 sudo -i
  Create Docker Repo  
 We need to add a repo so we can install Docker. 
 cat > /etc/yum.repos.d/docker.repo <<"EOF"
[dockerrepo]
name=Docker Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/7
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg
EOF
  Install Docker Service  
 Now we need to install Docker and enable the service. 
 yum install -y docker-engine-1.9.1 docker-engine-selinux-1.9.1
systemctl start docker
systemctl enable docker
  Install Cloudbreak Deployer  
 Now we can install Cloudbreak itself. 
 yum -y install unzip tar
curl -Ls s3.amazonaws.com/public-repo-1.hortonworks.com/HDP/cloudbreak/cloudbreak-deployer_1.14.0_$(uname)_x86_64.tgz | sudo tar -xz -C /bin cbd
  
 Once the Cloudbreak Deployer is installed, you can check the version of the install software. 
 cbd --version
  
 You should see something similar to this: 
 [root@cloudbreak cloudbreak-deployment]# cbd --version
Cloudbreak Deployer: 1.14.0
  
 NOTE: Notice that we are installing version 1.14.0.  You may want to consider installing the latest version, which is 1.16.1 as of August 2017.  Create Cloudbreak Profile  
 You should make a Cloudbreak application directory. This is where the Cloudbreak configuration files and logs will be located. 
 cd /opt
mkdir cloudbreak-deployment
cd cloudbreak-deployment
  
 Now you need to setup the Profile file. This file contains environment variables that determines how Cloudbreak runs. Edit Profile using your editor of choice.  
 I recommend the following settings for your profile: 
 export UAA_DEFAULT_SECRET='[SECRET]'
export UAA_DEFAULT_USER_EMAIL='<myemail>'
export UAA_DEFAULT_USER_PW='<mypassword>'
export PUBLIC_IP=192.168.56.100
export CLOUDBREAK_SMTP_SENDER_USERNAME='<myemail>'
export CLOUDBREAK_SMTP_SENDER_PASSWORD='<mypassword>'
export CLOUDBREAK_SMTP_SENDER_HOST='smtp.gmail.com'
export CLOUDBREAK_SMTP_SENDER_PORT=25
export CLOUDBREAK_SMTP_SENDER_FROM='<myemail>'
export CLOUDBREAK_SMTP_AUTH=true
export CLOUDBREAK_SMTP_STARTTLS_ENABLE=true
export CLOUDBREAK_SMTP_TYPE=smtp
  
 You should set the UAA_DEFAULT_USER_EMAIL variable to the email address you want to use. This is the account you will use to login to Cloudbreak. You should set the UAA_DEFAULT_USER_PW variable to the password you want to use. This is the password you will use to login to Cloudbreak.  
 You should set the CLOUDBREAK_SMTP_SENDER_USERNAME variable to the username you use to authenticate to your SMTP server. You should set the CLOUDBREAK_SMTP_SENDER_PASSWORD variable to the password you use to authenticate to your SMTP server.  
 NOTE: The SMTP variables are how you enable Cloudbreak to send you an email when the cluster operations are done. This is optional and is only required if you want to use the checkbox to get emails when you build a cluster. The example above assumes you are using GMail. You should use the settings appropriate for your SMTP server.  Initialize Cloudbreak Configuration  
 Now that you have a profile, you can initialize your Cloudbreak configuration files. 
 cbd generate
  
 You should see something similar to this: 
 [root@cloudbreak cloudbreak-deployment]# cbd generate
* Dependency required, installing sed latest ...
* Dependency required, installing jq latest ...
* Dependency required, installing docker-compose 1.9.0 ...
* Dependency required, installing aws latest ...
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
03310923a82b: Pulling fs layer
6fc6c6aca926: Pulling fs layer
6fc6c6aca926: Verifying Checksum
6fc6c6aca926: Download complete
03310923a82b: Verifying Checksum
03310923a82b: Download complete
03310923a82b: Pull complete
6fc6c6aca926: Pull complete
Digest: sha256:7875e46eb14555e893e7c23a7f90a0d2396f6b56c8c3dcf68f9ed14879b8966c
Status: Downloaded newer image for alpine:latest
Generating Cloudbreak client certificate and private key in /opt/cloudbreak-deployment/certs.
generating docker-compose.yml
generating uaa.yml
[root@cloudbreak cloudbreak-deployment]#
  Start Cloudbreak Deployer  
 You should be able to start the Cloudbreak Deployer application. This process will first pull down the Docker images used by Cloudbreak. 
 cbd pull
cbd start
  
 You should notice a bunch of images being pulled down: 
 [root@cloudbreak cloudbreak-deployment]# cbd start
generating docker-compose.yml
generating uaa.yml
Pulling haveged (hortonworks/haveged:1.1.0)...
1.1.0: Pulling from hortonworks/haveged
ca26f34d4b27: Pull complete
bf22b160fa79: Pull complete
d30591ea011f: Pull complete
22615e74c8e4: Pull complete
ceb5854e0233: Pull complete
Digest: sha256:09f8cf4f89b59fe2b391747181469965ad27cd751dad0efa0ad1c89450455626
Status: Downloaded newer image for hortonworks/haveged:1.1.0
Pulling uluwatu (hortonworks/cloudbreak-web:1.14.0)...
1.14.0: Pulling from hortonworks/cloudbreak-web
16e32a1a6529: Pull complete
8e153fce9343: Pull complete
6af1e6403bfe: Pull complete
075e3418c7e0: Pull complete
9d8191b4be57: Pull complete
38e38dfe826c: Pull complete
d5d08e4bc6be: Pull complete
955b472e3e42: Pull complete
02e1b573b380: Pull complete
Digest: sha256:06ceb74789aa8a78b9dfe92872c45e045d7638cdc274ed9b0cdf00b74d118fa2
...
Creating cbreak_periscope_1
Creating cbreak_logsink_1
Creating cbreak_identity_1
Creating cbreak_uluwatu_1
Creating cbreak_haveged_1
Creating cbreak_consul_1
Creating cbreak_mail_1
Creating cbreak_pcdb_1
Creating cbreak_uaadb_1
Creating cbreak_cbdb_1
Creating cbreak_sultans_1
Creating cbreak_registrator_1
Creating cbreak_logspout_1
Creating cbreak_cloudbreak_1
Creating cbreak_traefik_1
Uluwatu (Cloudbreak UI) url:
  https://192.168.56.100
login email:
  <myemail>
password:
  ****
creating config file for hdc cli: /root/.hdc/config
  
 The start command will output the IP address and the username to login which is based on what we setup in the Profile.  Check Cloudbreak Logs  
 You can always look at the Cloudbreak logs in /opt/cloudbreak-deployment/cbreak.log. You can also use the cbd logs cloudbreak command to view logs in real time. Cloudbreak is ready to use when you see a message similar to Started CloudbreakApplication in 64.156 seconds (JVM running for 72.52).  Login to Cloudbreak  
 Cloudbreak should now be running. We can login to the UI using the IP address specified in the Profile. In our case that is https://192.168.56.100. Notice Cloudbreak uses https.  
 You should see a login screen similar to this: 
     
  
 At this point you should be able the Cloudbreak UI screen where you can manage your credentials, blueprints, etc. This tutorial doesn't cover setting up credentials or deploying a cluster. Before you can deploy a cluster you need to setup a platform and credentials. See this link for setting up your credentials:  
  AWS: Cloudbreak AWS Credentisl   Azure: Cloudbreak Azure Credentials   Stopping Cloudbreak  
 When you are ready to shutdown Cloudbeak, the process is simple. First you need to stop the Cloudbreak deployer: 
 $ cbd kill
  
 You should see something similar to this: 
 [root@cloudbreak cloudbreak-deployment]# cbd kill
Stopping cbreak_traefik_1 ... done
Stopping cbreak_cloudbreak_1 ... done
Stopping cbreak_logspout_1 ... done
Stopping cbreak_registrator_1 ... done
Stopping cbreak_sultans_1 ... done
Stopping cbreak_uaadb_1 ... done
Stopping cbreak_cbdb_1 ... done
Stopping cbreak_pcdb_1 ... done
Stopping cbreak_mail_1 ... done
Stopping cbreak_haveged_1 ... done
Stopping cbreak_consul_1 ... done
Stopping cbreak_uluwatu_1 ... done
Stopping cbreak_identity_1 ... done
Stopping cbreak_logsink_1 ... done
Stopping cbreak_periscope_1 ... done
Going to remove cbreak_traefik_1, cbreak_cloudbreak_1, cbreak_logspout_1, cbreak_registrator_1, cbreak_sultans_1, cbreak_uaadb_1, cbreak_cbdb_1, cbreak_pcdb_1, cbreak_mail_1, cbreak_haveged_1, cbreak_consul_1, cbreak_uluwatu_1, cbreak_identity_1, cbreak_logsink_1, cbreak_periscope_1
Removing cbreak_traefik_1 ... done
Removing cbreak_cloudbreak_1 ... done
Removing cbreak_logspout_1 ... done
Removing cbreak_registrator_1 ... done
Removing cbreak_sultans_1 ... done
Removing cbreak_uaadb_1 ... done
Removing cbreak_cbdb_1 ... done
Removing cbreak_pcdb_1 ... done
Removing cbreak_mail_1 ... done
Removing cbreak_haveged_1 ... done
Removing cbreak_consul_1 ... done
Removing cbreak_uluwatu_1 ... done
Removing cbreak_identity_1 ... done
Removing cbreak_logsink_1 ... done
Removing cbreak_periscope_1 ... done
[root@cloudbreak cloudbreak-deployment]#
  
 Now exit the Vagrant box: 
 [root@cloudbreak cloudbreak-deployment]# exit
logout
[vagrant@cloudbreak ~]$ exit
logout
Connection to 127.0.0.1 closed.
  
 Now we can shutdown the Vagrant box 
 $ vagrant halt
==> cbtest: Attempting graceful shutdown of VM...
  Starting Cloudbreak  
 To startup Cloudbreak, the process is the opposite of stopping it. First you need to start the Vagrant box: 
 $ vagrant up
  
 Once the Vagrant box is up, you need to ssh in to the box: 
 $ vagrant ssh
  
 You need to be root: 
 $ sudo -i
  
 Now start Cloudbreak: 
 $ cd /opt/cloudbreak-deployment
$ cbd start
  
 You should see something similar to this: 
 [root@cloudbreak cloudbreak-deployment]# cbd start
generating docker-compose.yml
generating uaa.yml
Creating cbreak_consul_1
Creating cbreak_periscope_1
Creating cbreak_sultans_1
Creating cbreak_uluwatu_1
Creating cbreak_identity_1
Creating cbreak_uaadb_1
Creating cbreak_pcdb_1
Creating cbreak_mail_1
Creating cbreak_haveged_1
Creating cbreak_logsink_1
Creating cbreak_cbdb_1
Creating cbreak_logspout_1
Creating cbreak_registrator_1
Creating cbreak_cloudbreak_1
Creating cbreak_traefik_1
Uluwatu (Cloudbreak UI) url:
  https://192.168.56.100
login email:
  <myemail>
password:
  ****
creating config file for hdc cli: /root/.hdc/config
[root@cloudbreak cloudbreak-deployment]#
  
 It takes a minute or two for the Cloudbreak application to fully start up. Now you can login to the Cloudbreak UI.  Review  
 If you have successfully followed along with this tutorial, you should now have a Vagrant box you can spin up via vagrant up, startup Cloudbreak via cbd start and then create your clusters on the cloud. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		04-28-2017
	
		
		02:58 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Raphaël MARY   Yes, more than likely.  You can read more about Twitter TLS here: https://dev.twitter.com/overview/api/tls 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-24-2017
	
		
		07:58 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 @Stefan Schuster   The Sandbox is setup to assume that you have "sandbox.hortonworks.com" in your local computer host file. So all of the links will typically use "sandbox.hortonworks.com".  If you don't update your local host file, you will fail to connect.  Are you on Windows, Mac or Linux?  that will determine the appropriate approach.  Mac and Linux host files are usually /etc/hosts.  I'm using the Docker Sandbox and I'm on a Mac.  My /etc/hosts file looks like this:  127.0.0.1 localhost sandbox.hortonworks.com sandbox 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-24-2017
	
		
		07:47 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		6 Kudos
		
	
				
		
	
		
					
							 Objective  This tutorial will walk you through the process of using the PyHive Python module from Dropbox to query HiveServer2. You can read more about PyHive here: PyHive    Prerequisites  
 You should already have Python 2.7 installed.  You should already have a version of the Hortonworks Sandbox 2.5 setup.   Scope  This tutorial was tested using the following environment and components:  
 Mac OS X 10.12.3  Anaconda 4.3.1 (Python 2.7.13)  Hortonworks HDP Sandbox 2.5  PyHive 0.1.5     Steps  Install PyHive and Dependancies  Before we can query Hive using Python, we have to install the PyHive module and associated dependancies. Because I'm using Anaconda, I chose to use the  conda  command to install PyHive. Because the PyHive module is provided by a third party, Blaze, you must specify  -c blaze  with the command line.  You can read more about Blaze PyHive for Anaconda here: Blaze PyHive  We need to instal PyHive using the following command: 
 $ conda install -c blaze pyhive  You will be doing this installation on your local computer. You should see something similar to the following: 
 $ conda install -c blaze pyhive
Fetching package metadata ...........
Solving package specifications: .
Package plan for installation in environment /Users/myoung/anaconda:
The following NEW packages will be INSTALLED:
    pyhive: 0.1.5-py27_0 blaze
    sasl:   0.1.3-py27_0 blaze
    thrift: 0.9.2-py27_0 blaze
Proceed ([y]/n)? y
thrift-0.9.2-p 100% |#####################################################################################################################################| Time: 0:00:00   3.07 MB/s
sasl-0.1.3-py2 100% |#####################################################################################################################################| Time: 0:00:00  15.18 MB/s
pyhive-0.1.5-p 100% |#####################################################################################################################################| Time: 0:00:00  10.92 MB/s  As you can see, PyHive is dependant on the SASL and Thrift modules. Both of these modules were installed.  Create Python Script  Now that our local computer has the PyHive module installed, we can create a very simple Python script which will query Hive.  Edit a file called  pyhive-test.py . You can do this anywhere you like, but I prefer to create a directory under ~/Development for this. 
 $ mkdir ~/Development/pyhive
cd ~/Development/pyhive  Now copy and paste the following test into your file. You can use any text editor you like. I usually use Microsoft Visual Studio Code or Atom. 
 from pyhive import hive
cursor = hive.connect('sandbox.hortonworks.com').cursor()
cursor.execute('SELECT * FROM sample_07 LIMIT 50')
print cursor.fetchall()
  The sample07 database is already on the Sandbox, so this query should work without any problems.  Start Hortonworks HDP Sandbox  Before we can run our Python script, we have to make sure the Sandbox is started. Go ahead and do that now.  Run Python Script  Now that the Sandbox is runnig, we can run our script to execute the query. 
 $ python pyhive-test.py  You should see something similar to the following: 
 $ python pyhive-test.py
[[u'00-0000', u'All Occupations', 134354250, 40690], [u'11-0000', u'Management occupations', 6003930, 96150], [u'11-1011', u'Chief executives', 299160, 151370], [u'11-1021', u'General and operations managers', 1655410, 103780], [u'11-1031', u'Legislators', 61110, 33880], [u'11-2011', u'Advertising and promotions managers', 36300, 91100], [u'11-2021', u'Marketing managers', 165240, 113400], [u'11-2022', u'Sales managers', 322170, 106790], [u'11-2031', u'Public relations managers', 47210, 97170], [u'11-3011', u'Administrative services managers', 239360, 76370], [u'11-3021', u'Computer and information systems managers', 264990, 113880], [u'11-3031', u'Financial managers', 484390, 106200], [u'11-3041', u'Compensation and benefits managers', 41780, 88400], [u'11-3042', u'Training and development managers', 28170, 90300], [u'11-3049', u'Human resources managers, all other', 58100, 99810], [u'11-3051', u'Industrial production managers', 152870, 87550], [u'11-3061', u'Purchasing managers', 65600, 90430], [u'11-3071', u'Transportation, storage, and distribution managers', 92790, 81980], [u'11-9011', u'Farm, ranch, and other agricultural managers', 3480, 61030], [u'11-9012', u'Farmers and ranchers', 340, 42480], [u'11-9021', u'Construction managers', 216120, 85830], [u'11-9031', u'Education administrators, preschool and child care center/program', 47980, 44430], [u'11-9032', u'Education administrators, elementary and secondary school', 218820, 82120], [u'11-9033', u'Education administrators, postsecondary', 101160, 85870], [u'11-9039', u'Education administrators, all other', 28640, 74230], [u'11-9041', u'Engineering managers', 184410, 115610], [u'11-9051', u'Food service managers', 191460, 48660], [u'11-9061', u'Funeral directors', 24020, 57660], [u'11-9071', u'Gaming managers', 3740, 69600], [u'11-9081', u'Lodging managers', 31890, 51140], [u'11-9111', u'Medical and health services managers', 242640, 84980], [u'11-9121', u'Natural sciences managers', 39370, 113170], [u'11-9131', u'Postmasters and mail superintendents', 26500, 57850], [u'11-9141', u'Property, real estate, and community association managers', 159660, 53530], [u'11-9151', u'Social and community service managers', 112330, 59070], [u'11-9199', u'Managers, all other', 356690, 91990], [u'13-0000', u'Business and financial operations occupations', 6015500, 62410], [u'13-1011', u'Agents and business managers of artists, performers, and athletes', 11680, 82730], [u'13-1021', u'Purchasing agents and buyers, farm products', 12930, 53980], [u'13-1022', u'Wholesale and retail buyers, except farm products', 132550, 53580], [u'13-1023', u'Purchasing agents, except wholesale, retail, and farm products', 281950, 56060], [u'13-1031', u'Claims adjusters, examiners, and investigators', 279400, 55470], [u'13-1032', u'Insurance appraisers, auto damage', 12150, 52020], [u'13-1041', u'Compliance officers, except agriculture, construction, health and safety, and transportation', 231910, 52740], [u'13-1051', u'Cost estimators', 219070, 58640], [u'13-1061', u'Emergency management specialists', 11610, 51470], [u'13-1071', u'Employment, recruitment, and placement specialists', 193620, 52710], [u'13-1072', u'Compensation, benefits, and job analysis specialists', 109870, 55740], [u'13-1073', u'Training and development specialists', 202820, 53040], [u'13-1079', u'Human resources, training, and labor relations specialists, all other', 211770, 56740]]  Review  As you can see, using Python to query Hive is fairly straight forward. We were able to install the required Python modules in a single command, create a quick Python script and run the script to get 50 records from the sample07 database in Hive. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		04-21-2017
	
		
		08:14 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Raphaël MARY   Which end point are you using for the processor?  You should use the sample or filter endpoint.  I don't believe you can use the firehouse endpoint unless you pay Twitter for access. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
         
					
				













