Member since
02-09-2016
559
Posts
422
Kudos Received
98
Solutions
10-30-2018
05:48 PM
4 Kudos
Objectives
HDPSearch 4.0 was recently announced (Blog) which upgrades Solr from 6.6 to 7.4. The HPDSearch 4.0 Ambari management pack will install HDPSearch 3.0 for HDP 2.6 and HDPSearch 4.0 for HDP 3.0. HDP 3.0 is required for HDPSearch 4.0 because the HDFS and Hive libraries have been updated for Hadoop 3.1. Using Cloudbreak 2.8 Tech Preview (TP) you can install an HDP 3.0 cluster that includes HDPSearch 4.0 using Cloudbreak's management pack extensions.
Cloudbreak 2.8 is a Tech Preview release and is not suitable for production usage. Similarly CB 2.8 TP doesn't officially support deploying HDP 3.0 clusters. The intent is to become familiar with the process for when Cloudbreak 2.9 is released.
This tutorial is designed to walk you through the process of deploying an HDP 3.0 cluster which includes HDPSearch 4.0 components on AWS using a custom Ambri blueprint.
Prerequisites
You should already have an installed version of Cloudbreak 2.8.
You can find the documentation on Cloudbreak here: Cloudbreak Documentation
You can find an article that walks you through installing a local version of Cloudbreak with Vagrant and Virtualbox here: HCC Article
You should have an AWS account with appropriate permissions.
You can read more about AWS permissions here: Cloudbreak Documentation
You should already have created your AWS credential in Cloudbreak.
You should be familiar with HDPSearch.
You can find the documentation on HDPSearch here:HDPSearch 4.0 Documentation
Scope
This tutorial was tested in the following environment:
Cloudbreak 2.8.0
HDPSearch 4.0
AWS (also works on Azure and Google)
Steps
1. Create New HDP Blueprint
We need to create a custom Ambari blueprint for an HDP 3.0 cluster. This tutorial provides a basic blueprint which has HDFS and YARN HA enabled.
Login to your Cloudbreak instance. In the left menu, click on Blueprints . Cloudbreak will display a list of built-in and custom blueprints. Click on the CREATE BLUEPRINT button. You should see something similar to the following:
If you have downloaded the blueprint JSON file, you can simply upload the file to create your new blueprint. Cloudbreak requires a unique name within the blueprint itself. If you wish to customize the blueprint name, you can edit the name in the editor window after uploading the blueprint. Enter a unique Name and a meaningful Description for the blueprint. These are displayed on the blueprint list screen. You can download the JSON blueprint file here: hdp301-ha-solr-blueprint.json
Click on the Upload JSON File button and select the blueprint JSON file you downloaded. You should see something similar to this:
Scroll to the bottom and click on the CREATE button. You should see the list of blueprints, including the newly created blueprint. You should see something similar to the following:
You can also choose to paste the JSON text by clicking on the Text radio button.
Here is the text of the blueprint JSON:
{
"Blueprints": {
"blueprint_name": "hdp301-ha-solr",
"stack_name": "HDP",
"stack_version": "3.0"
},
"settings": [
{
"recovery_settings": []
},
{
"service_settings": [
{
"name": "HIVE",
"credential_store_enabled": "false"
}
]
},
{
"component_settings": []
}
],
"host_groups": [
{
"name": "master_mgmt",
"components": [
{
"name": "METRICS_COLLECTOR"
},
{
"name": "METRICS_GRAFANA"
},
{
"name": "ZOOKEEPER_SERVER"
},
{
"name": "JOURNALNODE"
},
{
"name": "INFRA_SOLR"
},
{
"name": "INFRA_SOLR_CLIENT"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "ZOOKEEPER_CLIENT"
},
{
"name": "HDFS_CLIENT"
},
{
"name": "YARN_CLIENT"
},
{
"name": "OOZIE_CLIENT"
},
{
"name": "MAPREDUCE2_CLIENT"
},
{
"name": "HIVE_CLIENT"
},
{
"name": "TEZ_CLIENT"
},
{
"name": "HIVE_METASTORE"
},
{
"name": "HIVE_SERVER"
}
],
"cardinality": "1"
},
{
"name": "master_nn1",
"components": [
{
"name": "NAMENODE"
},
{
"name": "ZKFC"
},
{
"name": "RESOURCEMANAGER"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "APP_TIMELINE_SERVER"
},
{
"name": "ZOOKEEPER_SERVER"
},
{
"name": "JOURNALNODE"
},
{
"name": "HIVE_CLIENT"
},
{
"name": "HDFS_CLIENT"
},
{
"name": "YARN_CLIENT"
},
{
"name": "OOZIE_CLIENT"
},
{
"name": "ZOOKEEPER_CLIENT"
},
{
"name": "LIVY2_SERVER"
},
{
"name": "SPARK2_CLIENT"
},
{
"name": "MAPREDUCE2_CLIENT"
},
{
"name": "TEZ_CLIENT"
}
],
"cardinality": "1"
},
{
"name": "master_nn2",
"components": [
{
"name": "NAMENODE"
},
{
"name": "ZKFC"
},
{
"name": "RESOURCEMANAGER"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "HISTORYSERVER"
},
{
"name": "HIVE_SERVER"
},
{
"name": "PIG"
},
{
"name": "OOZIE_SERVER"
},
{
"name": "ZOOKEEPER_SERVER"
},
{
"name": "JOURNALNODE"
},
{
"name": "HIVE_CLIENT"
},
{
"name": "HDFS_CLIENT"
},
{
"name": "YARN_CLIENT"
},
{
"name": "OOZIE_CLIENT"
},
{
"name": "ZOOKEEPER_CLIENT"
},
{
"name": "SPARK2_JOBHISTORYSERVER"
},
{
"name": "SPARK2_CLIENT"
},
{
"name": "MAPREDUCE2_CLIENT"
},
{
"name": "TEZ_CLIENT"
}
],
"cardinality": "1"
},
{
"name": "datanode",
"components": [
{
"name": "HIVE_CLIENT"
},
{
"name": "TEZ_CLIENT"
},
{
"name": "SPARK2_CLIENT"
},
{
"name": "YARN_CLIENT"
},
{
"name": "OOZIE_CLIENT"
},
{
"name": "DATANODE"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "NODEMANAGER"
},
{
"name": "SOLR_SERVER"
}
],
"cardinality": "1+"
}
],
"configurations": [
{
"core-site": {
"properties": {
"fs.trash.interval": "4320",
"fs.defaultFS": "hdfs://mycluster",
"ha.zookeeper.quorum": "%HOSTGROUP::master_nn1%:2181,%HOSTGROUP::master_nn2%:2181,%HOSTGROUP::master_mgmt%:2181",
"hadoop.proxyuser.falcon.groups": "*",
"hadoop.proxyuser.root.groups": "*",
"hadoop.proxyuser.livy.hosts": "*",
"hadoop.proxyuser.falcon.hosts": "*",
"hadoop.proxyuser.oozie.hosts": "*",
"hadoop.proxyuser.oozie.groups": "*",
"hadoop.proxyuser.hive.groups": "*",
"hadoop.proxyuser.livy.groups": "*",
"hadoop.proxyuser.hbase.groups": "*",
"hadoop.proxyuser.hbase.hosts": "*",
"hadoop.proxyuser.root.hosts": "*",
"hadoop.proxyuser.hive.hosts": "*",
"hadoop.proxyuser.yarn.hosts": "*"
}
}
},
{
"hdfs-site": {
"properties": {
"dfs.namenode.safemode.threshold-pct": "0.99",
"dfs.client.failover.proxy.provider.mycluster": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
"dfs.ha.automatic-failover.enabled": "true",
"dfs.ha.fencing.methods": "shell(/bin/true)",
"dfs.ha.namenodes.mycluster": "nn1,nn2",
"dfs.namenode.http-address": "%HOSTGROUP::master_nn1%:50070",
"dfs.namenode.http-address.mycluster.nn1": "%HOSTGROUP::master_nn1%:50070",
"dfs.namenode.http-address.mycluster.nn2": "%HOSTGROUP::master_nn2%:50070",
"dfs.namenode.https-address": "%HOSTGROUP::master_nn1%:50470",
"dfs.namenode.https-address.mycluster.nn1": "%HOSTGROUP::master_nn1%:50470",
"dfs.namenode.https-address.mycluster.nn2": "%HOSTGROUP::master_nn2%:50470",
"dfs.namenode.rpc-address.mycluster.nn1": "%HOSTGROUP::master_nn1%:8020",
"dfs.namenode.rpc-address.mycluster.nn2": "%HOSTGROUP::master_nn2%:8020",
"dfs.namenode.shared.edits.dir": "qjournal://%HOSTGROUP::master_nn1%:8485;%HOSTGROUP::master_nn2%:8485;%HOSTGROUP::master_mgmt%:8485/mycluster",
"dfs.nameservices": "mycluster"
}
}
},
{
"hive-site": {
"properties": {
"hive.metastore.uris": "thrift://%HOSTGROUP::master_mgmt%:9083",
"hive.exec.compress.output": "true",
"hive.merge.mapfiles": "true",
"hive.server2.tez.initialize.default.sessions": "true",
"hive.server2.transport.mode": "http"
}
}
},
{
"mapred-site": {
"properties": {
"mapreduce.job.reduce.slowstart.completedmaps": "0.7",
"mapreduce.map.output.compress": "true",
"mapreduce.output.fileoutputformat.compress": "true"
}
}
},
{
"yarn-site": {
"properties": {
"hadoop.registry.rm.enabled": "true",
"hadoop.registry.zk.quorum": "%HOSTGROUP::master_nn1%:2181,%HOSTGROUP::master_nn2%:2181,%HOSTGROUP::master_mgmt%:2181",
"yarn.log.server.url": "http://%HOSTGROUP::master_nn2%:19888/jobhistory/logs",
"yarn.resourcemanager.address": "%HOSTGROUP::master_nn1%:8050",
"yarn.resourcemanager.admin.address": "%HOSTGROUP::master_nn1%:8141",
"yarn.resourcemanager.cluster-id": "yarn-cluster",
"yarn.resourcemanager.ha.automatic-failover.zk-base-path": "/yarn-leader-election",
"yarn.resourcemanager.ha.enabled": "true",
"yarn.resourcemanager.ha.rm-ids": "rm1,rm2",
"yarn.resourcemanager.hostname": "%HOSTGROUP::master_nn1%",
"yarn.resourcemanager.hostname.rm1": "%HOSTGROUP::master_nn1%",
"yarn.resourcemanager.hostname.rm2": "%HOSTGROUP::master_nn2%",
"yarn.resourcemanager.recovery.enabled": "true",
"yarn.resourcemanager.resource-tracker.address": "%HOSTGROUP::master_nn1%:8025",
"yarn.resourcemanager.scheduler.address": "%HOSTGROUP::master_nn1%:8030",
"yarn.resourcemanager.store.class": "org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore",
"yarn.resourcemanager.webapp.address": "%HOSTGROUP::master_nn1%:8088",
"yarn.resourcemanager.webapp.address.rm1": "%HOSTGROUP::master_nn1%:8088",
"yarn.resourcemanager.webapp.address.rm2": "%HOSTGROUP::master_nn2%:8088",
"yarn.resourcemanager.webapp.https.address": "%HOSTGROUP::master_nn1%:8090",
"yarn.resourcemanager.webapp.https.address.rm1": "%HOSTGROUP::master_nn1%:8090",
"yarn.resourcemanager.webapp.https.address.rm2": "%HOSTGROUP::master_nn2%:8090",
"yarn.timeline-service.address": "%HOSTGROUP::master_nn1%:10200",
"yarn.timeline-service.webapp.address": "%HOSTGROUP::master_nn1%:8188",
"yarn.timeline-service.webapp.https.address": "%HOSTGROUP::master_nn1%:8190"
}
}
}
]
}
2. Register Management Pack
HDPSearch is installed via an Ambari Management Pack. To automate the deployment of HDPSearch via a blueprint, you need to register the HDPSearch Management Pack with Cloudbreak.
In the left menu, click on Cluster Extensions . This will expand expand to show Recipes and Management Packs . Click on Management Packs . You should see something similar to the following:
Click on REGISTER MANAGEMENT PACK . You should see something similar to the following:
Enter a unique Name and meaningful Description . The Management Pack URL for the HDPSearch 4.0 Management Pack should be http://public-repo-1.hortonworks.com/HDP-SOLR/hdp-solr-ambari-mp/solr-service-mpack-4.0.0.tar.gz .
Click Create . You should see something similar to the following:
3. Create Cluster
Now that we have a custom blueprint based on HDP 3.0 with a Solr component and we have registered the HDPSearch 4.0 Management Pack, we are ready to create a cluster.
In the left menu, click on Clusters . Cloudbreak will display configured clusters. Click the CREATE CLUSTER button. Cloudbreak will display the Create Cluster wizard.
a. General Configuration
By default, the General Configuration screen is displayed using the BASIC view. The ADVANCED view gives you more control of AWS and cluster settings, to include features such as tags. You must use ADVANCED view to attach a Management Pack to a cluster. You can change your view to ADVANCED manually or you can change your Cloudbreak preferences to show ADVANCED view by default. You should see something similar to the following:
Select Credential: Select the AWS credential you created. Most users will only have 1 credential per platform which will be selected automatically.
Cluster Name: Enter a name for your cluster. The name must be between 5 and 40 characters, must start with a letter, and must only include lowercase letters, numbers, and hyphens.
Region: Select the region in which you would like to launch your cluster.
Availability Zone: Select the availability zone in which you would like to launch your cluster.
Platform Version: Cloudbreak currently defaults to HDP 2.6. Select the dropdown arrow and select HDP 3.0 .
Cluster Type: Select the custom blueprint you recently created.
You should see something similar to the following:
Click the green Next button.
c. Image Settings
Cloudbreak will display the Image Settings screen. This where you can specify a custom Cloudbreak image or change the version of Ambari and HDP used in the cluster. You should see something similar to the following:
You do not need to change any settings on this page. Clck the green NEXT button.
d. Hardware and Storage
Cloudbreak will display the Hardware and Storage screen. On this screen, you have the ability to change the instance types, attached storage and where the Ambari server will be installed. As you you can see, the blueprint calls for deploying at least 4 nodes. We will use the defaults.
Click the green Next button.
e. Network and Availability
Cloudbreak will display the Network and Availability screen. On this screen, you have the ability to create a new VPC and Subnet or select from existing ones. The default is to create a new VPC and Subnet. We will use the defaults.
Click the green Next button.
f. Cloud Storage
Cloudbreak will display the Cloud Storage screen. On this screen, you have the ability to configure your cluster to have an instance profile allowing the cluster to access data on cloud storage. The default is to not configure cloud storage. We will use the defaults.
Click the green Next button.
g. Cluster Extensions
Cloudbreak will display the Cluster Extensions screen. On this screen, you have the ability to associate recipes with differnet host groups and attach management packs to the cluser. You should see something similar to the following:
On this screen is where we associate the HDPSearch 4.0 management pack we registered previously. Select the dropdown under Available Management Packs . Select the HDPSearch 4.0 management pack you registered. Then click the Install button. You should see something similar to the following:
Click the green Next button.
h. External Sources
Cloudbreak will display the External Sources screen. On this screen, you have the ability associate external sources like LDAP/AD and databases. You should see somethig similar to the following:
We will not be using this functionality with this cluster. Click the green Next button.
i. Gateway Configuration
Cloudbreak will display the Gateway Configuration screen. On this screen, you have the ability to enable a protected gateway. This gateway uses Knox to provide a secure access point for the cluster. You should see somethig similar to the following:
We will use the defaults. Click the green Next button.
j. Network Security Groups
Cloudbreak will display the Network Security Groups screen. On this screen, you have the ability to specify the Network Security Groups . You should see something similar to the following:
Cloudbreak defaults to creating new configurations. For production use cases, we highly recommend creating and refining your own definitions within the cloud platform. You can tell Cloudbreak to use those existing security groups by selecting the radio button. We need to add the Solr default port of 8983 to the host group where Solr will exist. This is the Data Node in the blueprint. I recommend that you specify "MyIP" to limit access to this port. You should see something similar to the following:
Click the green Next button.
f. Security
Cloudbreak will display the Security screen. On this screen, you have the ability to specify the Ambari admin username and password. You can create a new SSH key or selecting an existing one. And finally, you have the ability to enable Kerberos on the cluster. We will use admin for the username and BadPass#1 for the password. Select an existing SSH key from the drop down list. This should be a key you have already created and have access to the corresponding private key. We will NOT be enabling Kerberos, so make sure the Enable Kerberos Security checkbox is not checked. You should see something similar to the following:
Click the green CREATE CLUSTER button.
g. Cluster Summary
Cloudbreak will display the Cluster Summary page. It will generally take between 10-15 minutes for the cluster to be fully deployed. Click on the cluster you just created. You should see something similar to the following:
Click on the Ambari URL to open the Ambari UI.
h. Ambari
You will likely see a browser warning when you first open the Ambari UI. That is because we are using self-signed certificates.
Click on the ADVANCED button. Then click the link to Proceed .
You will be presented with the Ambari login page. You will login using the username and password you specified when you created the cluster. That should have been admin and BadPass#1 . Click the green Sign In button.
You should see the cluster summary screen. As you can see, we have a cluster a cluster which includes on the Solr component.
Click on the Solr service in the left hand menu. Now you can access the Quick Links menu for a shortcut to the Solr UI.
You should see the Solr UI. As you can see, this is Solr 7.4
Review
If you have successfully followed along with this tutorial, you should have created a custom HDP 3.0 blueprint which includes the Solr component, registered the HDPSearch 4.0 Management pack, and successfully deployed a cluster on AWS which included.
... View more
10-26-2018
05:54 PM
5 Kudos
Objectives
The release of Cloudbreak 2.7 enables you to deploy Hortonworks Data Flow (HDF) clusters. Currently there are two HDF cluster types supported: Flow Management (NiFi) and Messaging Management (Kafka). Cloudbreak expects HDF clusters to be deployed with security (LDAP, SSL). However, for testing purposes, many people would like to deploy a cluster without having to go through the steps of setting up SSL, LDAP, etc. Therefore, we'll need to modify the default HDF Flow Management blueprint to loosen the security configuration. This is not recommended for production use cases.
This tutorial is designed to walk you through the process of of deploying an HDF 3.1 Flow Management Cluster on AWS using Cloudbreak 2.7 using a custom blueprint.
Prerequisites
You should already have an installed version of Cloudbreak 2.7.
You can find the documentation on Cloudbreak here: Cloudbreak Documentation
You can find an article that walks you through installing a local version of Cloudbreak with Vagrant and Virtualbox here: HCC Article
You should have an AWS account with appropriate permissions.
You can read more about AWS permissions here: Cloudbreak Documentation
You should already have created your AWS credential in Cloudbreak.
Scope
This tutorial was tested in the following environment:
Cloudbreak 2.7.0
AWS (also works on Azure and Google)
Steps
1. Create New HDF Blueprint
Login to your Cloudbreak instance. In the left menu, click on Blueprints . Cloudbreak will display a list of built-in and custom blueprints. Click on the Flow Management: Apache NiFi, Apache NiFi Registry blueprint. you should see something similar to the following:
Now click on the RAW VIEW tab. You should see something similar to the following:
Now we need to copy the raw JSON from this blueprint. We need to make some modifications. Copy and paste the blueprint into your favorite text editor.
Change the blueprint_name line to "blueprint_name": "hdf-nifi-no-kerberos", . This is the name of the blueprint and it must be unique from other blueprints registered in Cloudbreak.
In the nifi-properties section we need to add a new line. We are going to add "nifi.security.user.login.identity.provider": "" . This change tells NiFi not to use an Identity Provider. Change this:
{
"nifi-properties": {
"nifi.sensitive.props.key": "changemeplease",
"nifi.security.identity.mapping.pattern.kerb": "^(.*?)@(.*?)$",
"nifi.security.identity.mapping.value.kerb": "$1",
}
},
to this:
{
"nifi-properties": {
"nifi.sensitive.props.key": "changemeplease",
"nifi.security.identity.mapping.pattern.kerb": "^(.*?)@(.*?)$",
"nifi.security.identity.mapping.value.kerb": "$1",
"nifi.security.user.login.identity.provider": ""
}
},
In the nifi-ambari-ssl-config section we need to change the nifi.node.ssl.isenabled settings from true to false . This change disables SSL between the NiFi nodes. Change this:
"nifi-ambari-ssl-config": {
"nifi.toolkit.tls.token": "changemeplease",
"nifi.node.ssl.isenabled": "true",
"nifi.toolkit.dn.prefix": "CN=",
"nifi.toolkit.dn.suffix": ", OU=NIFI"
}
to this:
"nifi-ambari-ssl-config": {
"nifi.toolkit.tls.token": "changemeplease",
"nifi.node.ssl.isenabled": "false",
"nifi.toolkit.dn.prefix": "CN=",
"nifi.toolkit.dn.suffix": ", OU=NIFI"
}
In the nifi-registry-ambari-ssl-config section we need to change the nifi.registry.ssl.isenabled settings from true to false . This change disables SSL for the NiFi Registry. Change this:
"nifi-registry-ambari-ssl-config": {
"nifi.registry.ssl.isenabled": "true",
"nifi.registry.toolkit.dn.prefix": "CN=",
"nifi.registry.toolkit.dn.suffix": ", OU=NIFI"
}
to this:
"nifi-registry-ambari-ssl-config": {
"nifi.registry.ssl.isenabled": "false",
"nifi.registry.toolkit.dn.prefix": "CN=",
"nifi.registry.toolkit.dn.suffix": ", OU=NIFI"
}
Under host_groups and Services we need to remove the NIFI_CA entry. This change removes the NiFi Certificate Authority. Change this:
"host_groups": [
{
"name": "Services",
"components": [
{
"name": "NIFI_CA"
}, {
"name": "NIFI_REGISTRY_MASTER"
},
to this:
"host_groups": [
{
"name": "Services",
"components": [
{
"name": "NIFI_REGISTRY_MASTER"
},
The complete blueprint looks like this:
{
"Blueprints": {
"blueprint_name": "hdf-nifi-no-kerberos",
"stack_name": "HDF",
"stack_version": "3.1"
},
"configurations": [
{
"nifi-ambari-config": {
"nifi.security.encrypt.configuration.password": "changemeplease",
"nifi.max_mem": "1g"
}
},
{
"nifi-properties": {
"nifi.sensitive.props.key": "changemeplease",
"nifi.security.identity.mapping.pattern.kerb": "^(.*?)@(.*?)$",
"nifi.security.identity.mapping.value.kerb": "$1",
"nifi.security.user.login.identity.provider": ""
}
},
{
"nifi-ambari-ssl-config": {
"nifi.toolkit.tls.token": "changemeplease",
"nifi.node.ssl.isenabled": "false",
"nifi.toolkit.dn.prefix": "CN=",
"nifi.toolkit.dn.suffix": ", OU=NIFI"
}
},
{
"nifi-registry-ambari-config": {
"nifi.registry.security.encrypt.configuration.password": "changemeplease"
}
},
{
"nifi-registry-properties": {
"nifi.registry.sensitive.props.key": "changemeplease",
"nifi.registry.security.identity.mapping.pattern.kerb": "^(.*?)@(.*?)$",
"nifi.registry.security.identity.mapping.value.kerb": "$1"
}
},
{
"nifi-registry-ambari-ssl-config": {
"nifi.registry.ssl.isenabled": "false",
"nifi.registry.toolkit.dn.prefix": "CN=",
"nifi.registry.toolkit.dn.suffix": ", OU=NIFI"
}
}
],
"host_groups": [
{
"name": "Services",
"components": [
{
"name": "NIFI_REGISTRY_MASTER"
},
{
"name": "METRICS_COLLECTOR"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "METRICS_GRAFANA"
},
{
"name": "ZOOKEEPER_CLIENT"
}
],
"cardinality": "1"
},
{
"name": "NiFi",
"components": [
{
"name": "NIFI_MASTER"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "ZOOKEEPER_CLIENT"
}
],
"cardinality": "1+"
},
{
"name": "ZooKeeper",
"components": [
{
"name": "ZOOKEEPER_SERVER"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "ZOOKEEPER_CLIENT"
}
],
"cardinality": "3+"
}
]
}
Save the updated blueprint to a file. Click on the CREATE BLUEPRINT button. You should see the Create Blueprint screen.
Enter the name of the new blueprint, something helpful such as hdf-nifi-no-kerberos . Click on the Upload JSON File button and upload the blueprint you just saved. You should see the new blueprint you created.
2. Create Cluster
In the left menu, click on Clusters . Cloudbreak will display configured clusters. Click the CREATE CLUSTER button. Cloudbreak will display the Create Cluster wizard
a. General Configuration
By default, the General Configuration screen is displayed using the BASIC view. The ADVANCED view gives you more control of AWS and cluster settings, to include features such as tags. You can change your view to ADVANCED manually or you can change your Cloudbreak preferences to show ADVANCED view by default. We will use the BASIC view.
Credential: Select the AWS credential you created. Most users will only have 1 credential per platform which will be selected automatically.
Cluster Name: Enter a name for your cluster. The name must be between 5 and 40 characters, must start with a letter, and must only include lowercase letters, numbers, and hyphens.
Region: Select the region in which you would like to launch your cluster.
Platform Version: Cloudbreak currently defaults to HDP 2.6. Select the dropdown arrow and select HDF 3.1 .
Cluster Type: As mentioned previously, there are two supported cluster types. Make sure select the blueprint you just created.
Click the green NEXT button.
c. Hardware and Storage
Cloudbreak will display the Hardware and Storage screen. On this screen, you have the ability to change the instance types, attached storage and where the Ambari server will be installed. As you you can see, we will deploy 1 NiFi and 1 Zookeeper node. In a production environment you would typically have at least 3 Zookeeper nodes. We will use the defaults.
Click the green NEXT button.
d. Gateway Configuration
Cloudbreak will display the Gateway Configuration screen. On this screen, you have the ability to enable a protected gateway. This gateway uses Knox to provide a secure access point for the cluster. Cloudbreak 2.7 does not currently support configuring Knox for HDF. We will leave this option disabled.
Click the green NEXT button.
e. Network
Cloudbreak will display the Network screen. On this screen, you have the ability to specify the Network , Subnet , and Security Groups . Cloudbreak defaults to creating new configurations. For production use cases, we highly recommend creating and refining your own definitions within the cloud platform. You can tell Cloudbreak to use those via the drop down menus. We will use the default options to create new configurations.
Because we are using a custom blueprint which disables SSL, we need to update the security groups with correct ports for the NiFi and NiFi Registry UIs. In the SERVICES security group, add the port 61080 with TCP . Click the + button to add the rule. In the NIFI security group, add the port 9090 with TCP . Click the + button to add the rule.
You should see something similar the following:
Click the green NEXT button.
f. Security
Cloudbreak will display the Security screen. On this screen, you have the ability to specify the Ambari admin username and password. You can create a new SSH key or selecting an existing one. And finally, you have the ability to enable Kerberos on the cluster. We will use admin for the username and BadPass#1 for the password. Select an existing SSH key from the drop down list. This should be a key you have already created and have access to the corresponding private key. We will NOT be enabling Kerberos, so uncheck the Enable Kerberos Security checkbox.
You have the ability to display a JSON version of the blueprint. You also have the ability display a JSON version of the cluster definition. Both of these can be used with Cloudbreak CLI to programatically automate these operations.
Click the green CREATE CLUSTER button.
g. Cluster Summary
Cloudbreak will display the Cluster Summary page. It will generally take between 10-15 minutes for the cluster to be fully deployed. As you can see, this screen looks similar to and HDP cluster. The big difference is the Blueprint and HDF Version .
Click on the Ambari URL to open the Ambari UI.
h. Ambari
You will likely see a browser warning when you first open the Ambari UI. That is because we are using self-signed certificates.
Click on the ADVANCED button. Then click the link to Proceed .
You will be presented with the Ambari login page. You will login using the username and password you specified when you created the cluster. That should have been admin and BadPass#1 . Click the green Sign In button.
You should see the cluster summary screen. As you can see, we have a cluster with Zookeeper, NiFi, and the NiFi Registry.
Click on the NiFi service in the left hand menu. Now you can access the Quick Links menu for a shortcut to the NiFi UI.
You should see the NiFi UI.
Back in the Ambari UI, click on the NiFi Registry service in the left hand menu. Now you can access the Quick Links menu for a shortcut to the NiFi Registry UI.
You should see the NiFi Registry UI.
Review
If you have successfully followed along with this tutorial, you should have created a Flow Management (NiFi) cluster on AWS using a custom blueprint. This cluster has SSL and LDAP configurations disabled for rapid prototyping abilities.
... View more
07-04-2018
03:49 PM
1 Kudo
@mmolnar This is great feedback. I've updated the article to include a link for downloading the files. Thank you!
... View more
05-30-2018
01:20 PM
9 Kudos
Objectives
This tutorial is designed to walk you through the process of using Vagrant and Virtualbox to create a local instance of Cloudbreak 2.4.1. This approach allows you start your local Cloudbreak deployer instance when you want to spin up an HDP cluster in a cloud environment without incurring costs associated with hosting your Cloudbreak deployer instance itself on the cloud.
This tutorial is an update to the original one located here:
HCC Article. However this version of the tutorial includes more automation for installing Cloudbreak and is based on Cloudbreak 2.4.x instead of 1.14.x.
Note: This tutorial has also been tested with Cloudbreak 2.7.0, 2.7.1, 2.72. and 2.8.0 TP. Prerequisites
You should already have installed VirtualBox 5.x. Read more here: VirtualBox You should already have installed Vagrant 2.x. Read more here: Vagrant You should already have installed the vagrant-vbguest plugin. This plugin will keep the VirtualBox Guest Additions software current as you upgrade your kernel and/or VirtualBox versions. Read more here: vagrant-vbguest You should already have installed the vagrant-hostmanager plugin. This plugin will automatically manage the /etc/hosts file on your local computer and in your virtual machines. Read more here: vagrant-hostmanager Scope
This tutorial was tested in the following environment:
macOS Sierra (version 10.13.4) VirtualBox 5.2.6 Vagrant 2.1.1 vagrant-vbguest plugin 0.15.2 vagrant-hostnamanger plugin 1.8.9 Cloudbreak 2.4.1 Steps Setup Vagrant Create Vagrant project directory
Before we get started, determine where you want to keep your Vagrant project files. Each Vagrant project should have its own directory. I keep my Vagrant projects in my ~/Development/Vagrant directory. You should also use a helpful name for each Vagrant project directory you create.
$ cd ~/Development/Vagrant
$ mkdir centos7-cloudbreak
$ cd centos7-cloudbreak
We will be using a CentOS 7.4 Vagrant box, so I include centos7 in the Vagrant project name to differentiate it from a CentOS 6 project. The project is for cloudbreak, so I include that in the name. Create Vagrantfile
You need to create a file named Vagrantfile . The Vagrantfile tells Vagrant how to configure your virtual machines. You can copy/paste my Vagrantfile below:
# -*- mode: ruby -*-
# vi: set ft=ruby :
# Using yaml to load external configuration files
require 'yaml'
Vagrant.configure("2") do |config|
# Using the hostmanager vagrant plugin to update the host files
config.hostmanager.enabled = true
config.hostmanager.manage_host = true
config.hostmanager.manage_guest = true
config.hostmanager.ignore_private_ip = false
# Run install script
config.vm.provision "shell", path: "install.sh"
# Loading in the VM configuration information
servers = YAML.load_file('servers.yaml')
servers.each do |servers|
config.vm.define servers["name"] do |srv|
srv.ssh.username = "vagrant"
srv.ssh.password = "vagrant"
srv.vm.box = servers["box"] # Speciy the name of the Vagrant box file to use
srv.vm.hostname = servers["name"] # Set the hostname of the VM
srv.vm.network "private_network", ip: servers["ip"], :adapater=>2 # Add a second adapater with a specified IP
srv.vm.provision :shell, inline: "sed -i'' '/^127.0.0.1\t#{srv.vm.hostname}\t#{srv.vm.hostname}$/d' /etc/hosts" # Remove the extraneous first entry in /etc/hosts
srv.vm.provider :virtualbox do |vb|
vb.name = servers["name"] # Name of the VM in VirtualBox
vb.cpus = servers["cpus"] # How many CPUs to allocate to the VM
vb.memory = servers["ram"] # How much memory to allocate to the VM
end
end
end
end
Create a servers.yaml file
You need to create a file name servers.yaml . The servers.yaml file contains the configuration information for our VMs. Here is the content from my file:
---
- name: cloudbreak
box: bento/centos-7.4
cpus: 2
ram: 4096
ip: 192.168.56.100
NOTE: You may need to modify the IP address to avoid conflicts with your local network. Create install.sh file
You need to create a file called install.sh . The install.sh file is a script that will run on your VM the first time it is provisioned. The line the
Vagrantfile that runs this is here:
config.vm.provision "shell", path: "install.sh"
This allows us to automate configuration tasks that would other wise be tedious and/or repetitive. Here is the content from my file:
#!/bin/bash
# Install prerequisites
sudo yum -y update
sudo yum -y install net-tools ntp wget lsof unzip tar iptables-services
# Enable NTP
sudo systemctl enable ntpd && sudo systemctl start ntpd
# Disable Firewall
sudo systemctl disable firewalld && sudo systemctl stop firewalld
sudo iptables --flush INPUT && sudo iptables --flush FORWARD && sudo service iptables save
# Disable SELINUX
sudo sed -i --follow-symlinks 's/^SELINUX=.*/SELINUX=disabled/g' /etc/sysconfig/selinux
# Create Docker repo
cat > /etc/yum.repos.d/docker.repo <<EOF
[dockerrepo]
name=Docker Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/7
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg
EOF
# Install Docker, enable and start service
yum install -y docker-engine docker-engine-selinux
systemctl start docker
systemctl enable docker
# Install Cloudbreak application
mkdir /opt/cloudbreak-deployment
cd /opt/cloudbreak-deployment
curl -Ls public-repo-1.hortonworks.com/HDP/cloudbreak/cloudbreak-deployer_2.4.1_$(uname)_x86_64.tgz | sudo tar -xz -C /bin cbd
This installation script performs the prerequisite package installations and configurations. This script also automates most of the Cloudbreak installation tasks.
Note: The last line is a curl call to download a specific version of Cloudbreak. If you want to install 2.7.2 or 2.8.0 TP, then update the version in the URL of the curl command. Start Virtual Machine
Once you have created the 3 files in your Vagrant project directory, you are ready to start your instance. Creating the instance for the first time and starting it every time after that uses the same
vagrant up command.
$ vagrant up
You should notice Vagrant automatically updating the packages and installing additional packages on the first start of the VM.
Once the process is complete you should have 1 vm running. You can verify by looking at the VirtualBox UI where you should see the
cloudbreak VM running. You should see something similar to this:
Connect to Your Virtual Machine
You should be able to login to your VM using the
vagrant ssh command. You should see something similar to the following:
$ vagrant ssh
[vagrant@cloudbreak ~]$
Configure Cloudbreak
The installation of Cloudbreak is covered well in the docs:
Cloudbreak Install Docs. However, we've automated most of the tasks using the install.sh script. You can skip down to the Install Cloudbreak on Your Own VM section, step 3.
We need to be root for this, so we'll use
sudo .
$ sudo -i
Create Profile file
Now you need to setup the Profile file. This file contains environment variables that determines how Cloudbreak runs. Edit
Profile using your editor of choice.
You need to include at least 4 settings.
export UAA_DEFAULT_SECRET='[SECRET]'
export UAA_DEFAULT_USER_EMAIL='<myemail>'
export UAA_DEFAULT_USER_PW='<mypassword>'
export PUBLIC_IP=192.168.56.100
You should set the
UAA_DEFAULT_USER_EMAIL variable to the email address you want to use. This is the account you will use to login to Cloudbreak. You should set the UAA_DEFAULT_USER_PW variable to the password you want to use. This is the password you will use to login to Cloudbreak. You may need to change the value of PUBLIC_IP to avoid conflicts on your network. Verify Cloudbreak Version
You should check the version of Cloudbreak to make sure the correct version is installed.
[root@cloudbreak cloudbreak-deployment]# cbd --version
You should see something similar to this:
[root@cloudbreak cloudbreak-deployment]# cbd --version
Cloudbreak Deployer: 2.4.1
NOTE: We are installing version 2.4.1 which is the latest GA version as of May 2018 Initialize Cloudbreak Configuration
Now that you have a profile, you can initialize your Cloudbreak configuration files. First you need to run the
cbd generate command. You should see something similar to the following:
[root@cloudbreak cloudbreak-deployment]# cbd generate
* Dependency required, installing sed latest ...
* Dependency required, installing jq latest ...
* Dependency required, installing docker-compose 1.13.0 ...
* Dependency required, installing aws latest ...
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
ff3a5c916c92: Pulling fs layer
ff3a5c916c92: Verifying Checksum
ff3a5c916c92: Download complete
ff3a5c916c92: Pull complete
Digest: sha256:7df6db5aa61ae9480f52f0b3a06a140ab98d427f86d8d5de0bedab9b8df6b1c0
Status: Downloaded newer image for alpine:latest
Generating Cloudbreak client certificate and private key in /opt/cloudbreak-deployment/certs with 192.168.56.100 into /opt/cloudbreak-deployment/certs/traefik.
generating docker-compose.yml
generating uaa.yml
The second step is to pull down the the Docker images used by Cloudbreak using the
cbd pull-parallel command. You should see something similar to the following:
[root@cloudbreak cloudbreak-deployment]# cbd pull-parallel
Pulling haveged (hortonworks/haveged:1.1.0)...
1.1.0: Pulling from hortonworks/haveged
Digest: sha256:31c6151ebd88ac65322969c7a71969c0d95d98a9eafd4eaab56e11c62c48c42b
Status: Downloaded newer image for hortonworks/haveged:1.1.0
Pulling uluwatu (hortonworks/hdc-web:2.4.1)...
2.4.1: Pulling from hortonworks/hdc-web
...
Start Cloudbreak
Once you have generated the configuraiton files and pulled down the Docker images, you can start Cloudbreak. You start Cloudbreak using the
cbd start command. You should see something similar to the following:
[root@cloudbreak cloudbreak-deployment]# cbd start
generating docker-compose.yml
generating uaa.yml
Pulling haveged (hortonworks/haveged:1.1.0)...
1.1.0: Pulling from hortonworks/haveged
ca26f34d4b27: Pull complete
bf22b160fa79: Pull complete
d30591ea011f: Pull complete
22615e74c8e4: Pull complete
ceb5854e0233: Pull complete
Digest: sha256:09f8cf4f89b59fe2b391747181469965ad27cd751dad0efa0ad1c89450455626
Status: Downloaded newer image for hortonworks/haveged:1.1.0
Pulling uluwatu (hortonworks/cloudbreak-web:1.14.0)...
1.14.0: Pulling from hortonworks/cloudbreak-web
16e32a1a6529: Pull complete
8e153fce9343: Pull complete
6af1e6403bfe: Pull complete
075e3418c7e0: Pull complete
9d8191b4be57: Pull complete
38e38dfe826c: Pull complete
d5d08e4bc6be: Pull complete
955b472e3e42: Pull complete
02e1b573b380: Pull complete
Digest: sha256:06ceb74789aa8a78b9dfe92872c45e045d7638cdc274ed9b0cdf00b74d118fa2
...
Creating cbreak_periscope_1
Creating cbreak_logsink_1
Creating cbreak_identity_1
Creating cbreak_uluwatu_1
Creating cbreak_haveged_1
Creating cbreak_consul_1
Creating cbreak_mail_1
Creating cbreak_pcdb_1
Creating cbreak_uaadb_1
Creating cbreak_cbdb_1
Creating cbreak_sultans_1
Creating cbreak_registrator_1
Creating cbreak_logspout_1
Creating cbreak_cloudbreak_1
Creating cbreak_traefik_1
Uluwatu (Cloudbreak UI) url:
https://192.168.56.100
login email:
myoung@hortonworks.com
password:
****
creating config file for hdc cli: /root/.hdc/config
The start command will output the IP address and the username to login which is based on what we setup in the Profile. Check Cloudbreak Logs
You can always look at the Cloudbreak logs in /opt/cloudbrea-deployer/cbreak.log. You can also use the
cbd logs cloudbreak command to view logs in real time. Cloudbreak is ready to use when you see a message similar to Started CloudbreakApplication in 64.156 seconds (JVM running for 72.52) . Login to Cloudbreak
Cloudbreak should now be running. We can login to the UI using the IP address specified in the Profile. In our case that is
https://192.168.56.100 . Notice Cloudbreak uses https .
Your browser may display a warning similar to the following:
This is because of the self-signed certificate used by Cloudbreak. You should accept the certificate and trust the site. Then you should see a login screen similar to the following:
At this point you should be able to see the Cloudbreak UI screen where you can manage your credentials, blueprints, etc. This tutorial doesn't cover setting up credentials or deploying a cluster. Before you can deploy a cluster you need to setup
credentials . See this link for setting up your credentials:
Managing Cloudbreak AWS Credentials Stopping Cloudbreak
When you are ready to shutdown Cloudbreak, the process is simple. First you need to stop Cloudbreak using the
cbd kill command. You should see something similar to this:
[root@cloudbreak cloudbreak-deployment]# cbd kill
Stopping cbreak_traefik_1 ... done
Stopping cbreak_cloudbreak_1 ... done
Stopping cbreak_logspout_1 ... done
Stopping cbreak_registrator_1 ... done
Stopping cbreak_sultans_1 ... done
Stopping cbreak_uaadb_1 ... done
Stopping cbreak_cbdb_1 ... done
Stopping cbreak_pcdb_1 ... done
Stopping cbreak_mail_1 ... done
Stopping cbreak_haveged_1 ... done
Stopping cbreak_consul_1 ... done
Stopping cbreak_uluwatu_1 ... done
Stopping cbreak_identity_1 ... done
Stopping cbreak_logsink_1 ... done
Stopping cbreak_periscope_1 ... done
Going to remove cbreak_traefik_1, cbreak_cloudbreak_1, cbreak_logspout_1, cbreak_registrator_1, cbreak_sultans_1, cbreak_uaadb_1, cbreak_cbdb_1, cbreak_pcdb_1, cbreak_mail_1, cbreak_haveged_1, cbreak_consul_1, cbreak_uluwatu_1, cbreak_identity_1, cbreak_logsink_1, cbreak_periscope_1
Removing cbreak_traefik_1 ... done
Removing cbreak_cloudbreak_1 ... done
Removing cbreak_logspout_1 ... done
Removing cbreak_registrator_1 ... done
Removing cbreak_sultans_1 ... done
Removing cbreak_uaadb_1 ... done
Removing cbreak_cbdb_1 ... done
Removing cbreak_pcdb_1 ... done
Removing cbreak_mail_1 ... done
Removing cbreak_haveged_1 ... done
Removing cbreak_consul_1 ... done
Removing cbreak_uluwatu_1 ... done
Removing cbreak_identity_1 ... done
Removing cbreak_logsink_1 ... done
Removing cbreak_periscope_1 ... done
[root@cloudbreak cloudbreak-deployment]#
Now exit the Vagrant box:
[root@cloudbreak cloudbreak-deployment]# exit
logout
[vagrant@cloudbreak ~]$ exit
logout
Connection to 127.0.0.1 closed.
Now we can shutdown the Vagrant box
$ vagrant halt
==> cloudbreak: Attempting graceful shutdown of VM...
Starting Cloudbreak
To startup Cloudbreak, the process is the opposite of stopping it. First you need to start the Vagrant box:
$ vagrant up
Once the Vagrant box is up, you need to ssh in to the box:
$ vagrant ssh
You need to be root:
$ sudo -i
Before starting Cloudbreak, make sure you are in the application directory:
$ cd /opt/cloudbreak-deployer
Now start Cloudbreak using the
cbd start command. You should see something similar to this:
[root@cloudbreak cloudbreak-deployment]# cbd start
generating docker-compose.yml
generating uaa.yml
Creating cbreak_consul_1
Creating cbreak_periscope_1
Creating cbreak_sultans_1
Creating cbreak_uluwatu_1
Creating cbreak_identity_1
Creating cbreak_uaadb_1
Creating cbreak_pcdb_1
Creating cbreak_mail_1
Creating cbreak_haveged_1
Creating cbreak_logsink_1
Creating cbreak_cbdb_1
Creating cbreak_logspout_1
Creating cbreak_registrator_1
Creating cbreak_cloudbreak_1
Creating cbreak_traefik_1
Uluwatu (Cloudbreak UI) url:
https://192.168.56.100
login email:
myoung@hortonworks.com
password:
****
creating config file for hdc cli: /root/.hdc/config
[root@cloudbreak cloudbreak-deployment]#
It takes a minute or two for the Cloudbreak application to fully start up. Now you can login to the Cloudbreak UI. Review
If you have successfully followed along with this tutorial, you should now have a Vagrant box you can spin up via
vagrant up , startup Cloudbreak via cbd start and then create your clusters on the cloud.
You can download a copy of my Vagrantfile, server.yaml and install.sh file here:
https://github.com/Jaraxal/vagrant-virtualbox-cloudbreak
... View more
Labels:
10-18-2017
09:19 PM
13 Kudos
Objectives
Everyday Hortonworks customers are taking advantage of the flexibility and elasticity that cloud platforms provide. For many of these customers, Cloudbreak is used to manage their HDP clusters and to provide autoscaling capability.
Cloudbreak's autoscaling features are tied to Ambari Alerts. Ambari ships with a set of alerts out of the box. However, you may want to enable an autoscaling policy based on an alert that Ambari doesn't provide out of the box. The good news is Ambari supports creating custom alerts. Custom alerts created in Ambari are visible to Cloudbreak and usable with Cloudbreak autoscaling policies.
A common desire with autoscaling is to scale the cluster based on memory used, cores used, or perhaps the number of running applications. You can use the YARN ResourceManager JMX data to determine these values. For example, you may have a typical cluster with 5 Node Managers. You also know that sometimes your cluster usage will spike and you want to increase the number of Node Managers by 3, but you don't want to run 8 Node Managers all the time to save costs. You can create an alert based on the JMX data from YARN ResourceManager to scale the cluster based on usage. Then Cloudbreak can scale the cluster when the alert is triggered.
This tutorial will walk you through the process of creating a custom Ambari Alert for use by Cloudbreak autoscaling policies.
Prerequisites
You should have a properly running instance of Cloudbreak with credentials for your cloud provider of choice.
You should have an Ambari 2.5/HDP 2.6 cluster already deployed with Cloudbreak.
Scope
This tutorial was tested in the following environment:
Cloudbreak 1.16.4
AWS EC2
Ambari 2.5
HDP 2.6
Steps
Login into Ambari
As mentioned in the prerequisites, you should already have a cluster built using Cloudbreak. Click on the cluster summary box in the Cloudbreak UI to display the cluster details. Now click on the link to your Ambari cluster. You may see something similar to this:
Your screen may vary depending on your browser of choice. I'm using Chrome. This warning is because Cloudbreak uses self-signed certificates which are not trusted. Click on the Advanced link. You should see something similar to this:
Click on the Proceed link to open the Ambari login screen. You should be able to login to Ambari using the default username and password of admin unless you changed it.
Once you have logged into Ambari, you should see something similar to this:
NOTE: Your specific cluster may look different.
Login into YARN ResourceManager
YARN is the central component used to manage resource availability on an HDP cluster. In Ambari, you can see a high-level summary of resources available to YARN to by click on the YARN link in the service list on the Ambari dashboard. You should see something similar to this:
If you take a look at the upper right corner, you can see a summary of containers, applications and cluster memory. For this tutorial, I would like Cloudbreak to autoscale my cluster when the number of pending applications is greater than 2. To do this, I'm going to create a custom Ambari Alert based on that value. To get that value, I need to look at the YARN ResourceManager JMX data.
View YARN ResourceManager JMX Data
You can view available JMX data for the YARN ResourceManager via the Ambari Quick Links. You should already have the YARN ResourceManager dashboard visible from the last step. Click on the Quick Links drop down menu in the top middle of the screen. You should see something similar to this:
As you can see, ResourceManager JMX is available in the list. If you click that link you will see something similar to this:
You should see a fairly large JSON output. If you search for q0=root , you should see something similar to this:
This is a list of YARN related metrics associated with the root queue. If you look in the list of values, you should see AppsPending . This is the metric I want to use for my Ambari Alert.
Review existing Alert definitions
You can view the definition for any Ambari provided Alerts.
To get a list of all alerts on the system, you make a call to the Ambari API:
curl -u admin:admin -i -k -H 'X-Requested-By:ambari' https://#.#.#.#/ambari/api/v1/clusters/autoscaling/alert_definitions/
You should see something similar to this:
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 18 Oct 2017 17:15:21 GMT
Content-Type: text/plain
Content-Length: 21595
Connection: keep-alive
Vary: Accept-Encoding
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Cache-Control: no-store
Pragma: no-cache
Set-Cookie: AMBARISESSIONID=1gprc4wefyoiqmb1kj6plu95j;Path=/;HttpOnly
Expires: Thu, 01 Jan 1970 00:00:00 GMT
User: admin
Vary: Accept-Encoding, User-Agent
{
"href" : "http://#.#.#.#/api/v1/clusters/autoscaling/alert_definitions/",
"items" : [
{
"href" : "http://#.#.#.#/api/v1/clusters/autoscaling/alert_definitions/1",
"AlertDefinition" : {
"cluster_name" : "autoscaling",
"id" : 1,
"label" : "HBase Master Process",
"name" : "hbase_master_process"
}
},
...
NOTE: Your username and password may be different. You need to update the curl call to use your IP address for the Ambari server and your cluster name. In this example, my cluster name is autoscaling . Also notice the use of https for Cloudbreak clusters and the need for the -k flag.
As you can see, each alert is assigned a unique id. To view the configuration of a specific alert, you make a curl call to the href link with the alert id provided in the output.
To see the definition of Alert id 1 , make the following curl call:
curl -u admin:admin -i -k -H 'X-Requested-By:ambari' https://#.#.#.#/ambari/api/v1/clusters/autoscaling/alert_definitions/1
NOTE: With Cloudbreak, Ambari is using HTTPS and is proxied so change http to https and /api to /ambari/api .
You should see something similar to this:
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 18 Oct 2017 17:24:00 GMT
Content-Type: text/plain
Content-Length: 1156
Connection: keep-alive
Vary: Accept-Encoding
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Cache-Control: no-store
Pragma: no-cache
Set-Cookie: AMBARISESSIONID=bcdh6wmyxpnd1ioufen9hikva;Path=/;HttpOnly
Expires: Thu, 01 Jan 1970 00:00:00 GMT
User: admin
Vary: Accept-Encoding, User-Agent
{
"href" : "http://#.#.#.#/api/v1/clusters/autoscaling/alert_definitions/1",
"AlertDefinition" : {
"cluster_name" : "autoscaling",
"component_name" : "HBASE_MASTER",
"description" : "This alert is triggered if the HBase master processes cannot be confirmed to be up and listening on the network for the configured critical threshold, given in seconds.",
"enabled" : true,
"help_url" : null,
"id" : 1,
"ignore_host" : false,
"interval" : 1,
"label" : "HBase Master Process",
"name" : "hbase_master_process",
"repeat_tolerance" : 1,
"repeat_tolerance_enabled" : false,
"scope" : "ANY",
"service_name" : "HBASE",
"source" : {
"default_port" : 60000.0,
"reporting" : {
"ok" : {
"text" : "TCP OK - {0:.3f}s response on port {1}"
},
"warning" : {
"text" : "TCP OK - {0:.3f}s response on port {1}",
"value" : 1.5
},
"critical" : {
"text" : "Connection failed: {0} to {1}:{2}",
"value" : 5.0
}
},
"type" : "PORT",
"uri" : "{{hbase-site/hbase.master.port}}"
}
}
The alert definitions will vary depending on the component. My advice is to look for existing alert definitions around the component for which you are interested and use that as a base for your custom alerts.
Create Custom Alert JSON file
To submit a custom alert to Ambari, we can define the alert in a JSON file which we upload via the Ambari API. You can copy and paste the following alert definition to your alert file:
{
"AlertDefinition" : {
"cluster_name" : "autoscaling",
"component_name" : "RESOURCEMANAGER",
"description" : "This queue-level alert is triggered if the number of root queue pending applications exceeds 1.",
"enabled" : true,
"help_url" : null,
"ignore_host" : false,
"interval" : 5,
"label" : "[CUSTOM] ResourceManager Pending Applications",
"name" : "queue_pending_applications",
"repeat_tolerance" : 1,
"repeat_tolerance_enabled" : false,
"scope" : "ANY",
"service_name" : "YARN",
"source" : {
"jmx" : {
"property_list" : ["Hadoop:service=ResourceManager,name=QueueMetrics,q0=root/AppsPending"],
"value" : "{0}"
},
"reporting" : {
"ok" : {
"text" : "YARN Pending Applications: {0}"
},
"warning" : {
"text" : "YARN Pending Applications: {0}",
"value" : 2
},
"critical" : {
"text" : "YARN Pending Applications: {0}",
"value" : 3
},
"units" : "Applications"
},
"type" : "METRIC",
"uri" : {
"http" : "{{yarn-site/yarn.resourcemanager.webapp.address}}",
"https" : "{{yarn-site/yarn.resourcemanager.webapp.https.address}}",
"https_property" : "{{yarn-site/yarn.http.policy}}",
"https_property_value" : "HTTPS_ONLY",
"kerberos_keytab" : "{{yarn-site/yarn.resourcemanager.webapp.spnego-keytab-file}}",
"kerberos_principal" : "{{yarn-site/yarn.resourcemanager.webapp.spnego-principal}}",
"default_port" : 0.0,
"connection_timeout" : 5.0,
"high_availability" : {
"alias_key" : "{{yarn-site/yarn.resourcemanager.ha.rm-ids}}",
"http_pattern" : "{{yarn-site/yarn.resourcemanager.webapp.address.{{alias}}}}",
"https_pattern" : "{{yarn-site/yarn.resourcemanager.webapp.https.address.{{alias}}}}"
}
}
}
}
}
You will need to change the value of cluster_name to match the name of your cluster. The label and name values can be customized by you, but they should be unique from other alerts in the system. The label is what will be displayed in the Ambari. I like to prepend [CUSTOM] on custom alerts to make it clear. Once you make the appropriate changes, you save the file as alert.json or really any filename you like.
This alert, as defined with throw a WARNING alert when the number of pending applications is 2 and a CRITICAL alert when the number of pending applications is 3.
Upload Custom Alert JSON file
Now that we have the custom alert file, we can submit it to the Ambari API to create the new alert. You submit the alert by using the following curl call:
curl -u admin:admin -i -k -H 'X-Requested-By:ambari' -X POST -d @alert.json https://#.#.#.#/ambari/api/v1/clusters/autoscaling/alert_definitions
You should see something similar to the following:
HTTP/1.1 100 Continue
HTTP/1.1 201 Created
Server: nginx
Date: Wed, 18 Oct 2017 17:52:47 GMT
Content-Type: text/plain
Content-Length: 0
Connection: keep-alive
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Cache-Control: no-store
Pragma: no-cache
Set-Cookie: AMBARISESSIONID=18utggom97x7z33z3d2x9h1mf;Path=/;HttpOnly
Expires: Thu, 01 Jan 1970 00:00:00 GMT
User: admin
Verify Custom Alert Exists
You can verify the alert exists using the API call we used before:
curl -u admin:admin -i -k -H 'X-Requested-By:ambari' https://#.#.#.#/ambari/api/v1/clusters/autoscaling/alert_definitions
You should see the new alert at the bottom of the list:
{
"href" : "http://#.#.#.#/api/v1/clusters/autoscaling/alert_definitions/75",
"AlertDefinition" : {
"cluster_name" : "autoscaling",
"id" : 75,
"label" : "[CUSTOM] ResourceManager Pending Applications",
"name" : "queue_pending_applications"
}
}
]
You can also verify via the Ambari Alerts page. In the upper right-hand menu of Ambari, click on Alerts . You should see something similar to this:
Now filter for CUSTOM and you should see something similar to this:
As you can see, the alert exists in Ambari. After a few minutes, the status should change from NONE to OK .
Create Cloudbreak Autoscaling Policy
Now that our custom alert exists in Ambari, we can create a Cloudbreak autoscaling policy based on that alert. In the Cloudbreak UI, show the details for the cluster you have running. You should see something similar to this:
Click on the autoscaling SLA polices link to the right of details . You should see something similar to this:
By default, the policies should be disabled. You can click on the enable button to enable autoscaling. You should see something similar to this:
Before creating the policy, you have to define the Ambari Alert on which you want to trigger. Click the create alert button. You should see something similar to this:
You have to option to chose between metric based and time based alerts. Time based alerts allow you to define a cron based time period where autoscaling events will happen. For this tutorial, I'm going to use metric based.
The Alert Name and Description are up to you. I recommend using something informative. The Metric - Desired State is a drop down where you select from the list of available Ambari Alerts and you determine which Alert state you want to trigger. The Period is how long, in minutes, the alert should exist before an autoscaling event is triggered. You should use a value that is reasonable; you don't want the scaling events happening too quickly as that can cause a lot of churn.
You can see what I've used as an example:
When you have everything entered, click on the create alert button. Now we can define the scaling policy itself. Click on the create policy button. You should see something similar to this:
The Policy Name is up to you. Again, I recommend using something informative. The Scaling Adjustment is how many nodes to add to the cluster. The dropdown to the right specifies the node metric. You can specify a specific node count, a percentage of nodes based on the cluster size, or a total cluster node count. The Host Group defines which kind of nodes should be added. This will go back to your Blueprint used to build the cluster. You may have compute or data only nodes that you want to add. The Alert is the Cloudbreak Alert we created in the previous step.
You can see what I used as an example:
When you have everything entered, click on the create policy button. You should now have an Alert and Scaling Policy defined. You should see something similar to this:
Run Jobs On The Cluster
To trigger the alert, I'm going to run some jobs on my cluster. A simple test would be to run a couple of copies of TeraGen. Because of the size of my cluster, I shouldn't have the capacity to run more than 1 of those at a time. This should create pending applications which will trigger the alert.
To do this I'm going to log into one of the nodes in my cluster using ssh. You can do this using something similar to this:
ssh -i cloudbreak cloudbreak@#.#.#.#
NOTE: Your keyname and ip will be different.
You should see something similar to this:
The authenticity of host '#.#.#.# (#.#.#.#)' can't be established.
ECDSA key fingerprint is SHA256:C10UDnRxnTTaxkWqv5cPgw/FItKWvEdyWmeS2BKVUU8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '#.#.#.#' (ECDSA) to the list of known hosts.
__| __|_ )
_| ( / Amazon Linux AMI
___|\___|___|
https://aws.amazon.com/amazon-linux-ami/2017.03-release-notes/
27 package(s) needed for security, out of 61 available
Run "sudo yum update" to apply all updates.
Amazon Linux version 2017.09 is available.
I'm going to need 4 sessions because I want to have 4 submitted jobs at the same time. In each session I'm going to run the following command:
sudo -u hdfs hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar teragen 1000000000 /tmp/terasort1-input
For each session you need to specify a unique output directory. In my case I used terastor1-input , terasort2-input , etc. We need enough jobs running for the alert to trigger and be present for at least 5 minutes, which is the time period we specified in Cloudbreak. In Ambari, click on YARN to see the summary dashboard. You should see something similar to this:
If you click on the red 1 alert you can get more details. You should see something similar to this:
As you can see, this has been CRIT for 3 minutes. Cloudbreak won't trigger an autoscale event until it has been 5 minutes. After 5 minutes has passed and the alert is still present, Cloudbreak should start autoscaling. IF you look at the HISTORY second on the Cluster autoscaling page, you should see something similar to this:
As you can see, Cloudbreak as started the autoscaling process. It will add 1 node to the cluster based on our policy. You can also see this on the cluster details page in the Event History . You should see something similar to this:
After a couple of minutes, you should notice Ambari showing the addition of another node in the list of operations. You should see something similar to this:
Once the new node is added you should notice that one of the other jobs is picked up and the Alert changes from CRITICAL to WARN . You should see something similar to this:
Next Steps
The autoscaling policy we setup only addresses the addition of new nodes. You need to consider multiple policies that adjusts the cluster up and down. For example you could have a policy that sets the cluster size to a specific total node count when an alert is OK .
Cloudbreak also allows you to adjust the scaling configuration to allow for a cool down time with min and max cluster size. This helps you to control the amount of cluster churn created by autoscaling events. Combined with adjusting the period for the Cloudbreak alert, you have a reasonable amount of control over autoscaling on the cluster.
Review
If you have successfully followed along with this tutorial, you should have been able to create a custom Ambari Alert, to create a Cloudbreak autoscaling policy based on that alert, then see the alert and Cloudbreak autoscaling trigger based on running multiple TeraGen jobs.
... View more
Labels:
05-24-2017
06:06 PM
3 Kudos
This tutorial will walk you through the process of using Cloudbreak recipes to install TensorFlow for Anaconda Python on an HDP 2.6 cluster during cluster provisioning. We'll then update Zeppelin to use the newly install version of Anaconda and run a quick TensorFlow test.
Prerequisites
You should already have a Cloudbreak v1.14.4 environment running. You can follow this article to create a Cloudbreak instance using Vagrant and Virtualbox: HCC Article
You should already have created a blueprint that deploys HDP 2.6 with Spark 2.1. You can follow this article to get the blueprint setup. Do not create the cluster yet, as we will do that in this tutorial: HCC Article
You should already have credentials created in Cloudbreak for deploying on AWS (or Azure). This tutorial does not cover creating credentials.
Scope
This tutorial was tested in the following environment:
Cloudbreak 1.14.4
AWS EC2
HDP 2.6
Spark 2.1
Anaconda 2.7.13
TensorFlow 1.1.0
Steps
Create Recipe
Before you can use a recipe during a cluster deployment, you have to create the recipe. In the Cloudbreak UI, look for the mange recipes section. It should look similar to this:
If this is your first time creating a recipe, you will have 0 recipes instead of the 2 recipes show in my interface.
Now click on the arrow next to manage recipes to display available recipes. You should see something similar to this:
Now click on the green create recipe button. You should see something similar to this:
Now we can enter the information for our recipe. I'm calling this recipe tensorflow . I'm giving it the description of Install TensorFlow Python . You can choose to run the script as either pre-install or post-install . I'm choosing to do the install post-install . This means the script will be run after the Ambari installation process has started. So choose the Execution Type of POST . The script is fairly basic. We are going to download the Anaconda install script, then run it in silent mode. Then we'll use the Anaconda version of pip to install TensorFlow. Here is the script:
#!/bin/bash
wget https://repo.continuum.io/archive/Anaconda2-4.3.1-Linux-x86_64.sh
bash ./Anaconda2-4.3.1-Linux-x86_64.sh -b -p /opt/anaconda
/opt/anaconda/bin/pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp27-none-linux_x86_64.whl
You can read more about installing TensorFlow on Anaconda here: TensorFlow Docs.
When you have finished entering all of the information, you should see something similar to this:
If everything looks good, click on the green create recipe button.
You should be able to see the recipe in your list of recipes:
NOTE: You will most likely have a different list of recipes.
Create a Cluster using a Recipe
Now that our recipe has been created, we can create a cluster that uses the recipe. Go through the process of creating a cluster up to the Choose Blueprint step. This step is where you select the recipe you want to use. The recipes are not selected by default; you have to select the recipes you wish to use. You can specify recipes for 1 or more host groups. This allows you to run different recipes across different host groups (masters, slaves, etc). You can also select multiple recipes.
We want to use the hdp26-spark-21-cluster blueprint. This will create an HDP 2.6 cluster with Spark 2.1 and Zeppelin. You should have created this blueprint when you followed the prerequisite tutorial. You should see something similar to this:
In our case, we are going to run the tensorflow recipe on every host group. If you intend to use something like TensorFlow across the cluster, you should install it on at least the slave nodes and the client nodes.
After you have selected the recipe for the host groups, click the Review & Launch button, then launch the cluster. As the cluster is building, you should see a message in the Cloudbreak UI that indicates the recipe is running. When that happens, you will see something similar to this:
If you click on the building cluster, you can see more detailed information. You should see something similar to this:
Once the cluster has finished building, you should see something similar to this:
Cloudbreak will create logs for each recipe that runs on each host. These logs are located at /var/log/recipe and have the name of the recipe and whether it is pre or post install. For example, our recipe log is called post-tensorflow.log . You can tail this log file to following the execution of the script.
NOTE: Post install scripts won't be executed until the Ambari server is installed and the cluster is building. You can always monitor the /var/log/recipe directory on a node to see when the script is being executed. The time it takes to run the script will vary depending on the cloud environment and how long it takes to spin up the cluster.
On your cluster, you should be able to see the post-install log:
$ ls /var/log/recipes
post-tensorflow.log post-hdfs-home.log
Verify Anaconda Install
Once the install process is complete, you should be able to verify that Anaconda is installed. You need to ssh into one of the cloud instances. You can get the public ip address from the Cloudbreak UI. You will login using the corresponding private key to the public key you entered when you created the Cloudbreak credential. You should login as the cloudbreak user. You should see something similar to this:
$ ssh -i ~/Downloads/keys/cloudbreak_id_rsa cloudbreak@#.#.#.#
The authenticity of host '#.#.#.# (#.#.#.#)' can't be established.
ECDSA key fingerprint is SHA256:By1MJ2sYGB/ymA8jKBIfam1eRkDS5+DX1THA+gs8sdU.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '#.#.#.#' (ECDSA) to the list of known hosts.
Last login: Sat May 13 00:47:41 2017 from 192.175.27.2
__| __|_ )
_| ( / Amazon Linux AMI
___|\___|___|
https://aws.amazon.com/amazon-linux-ami/2016.09-release-notes/
25 package(s) needed for security, out of 61 available
Run "sudo yum update" to apply all updates.
Amazon Linux version 2017.03 is available.
Once you are on the server, you can check the version of Python:
$ /opt/anaconda/bin/python --version
Python 2.7.13 :: Anaconda 4.3.1 (64-bit)
Update Zeppelin Interpreter
We need to update the default spark2 interpreter configuration in Zeppelin. We need to access the Zeppelin UI from Ambari. You can login to Ambari for the new cluster from the Cloudbreak UI cluster details page. Once you login to Ambari, you can access the Zeppelin UI from the Ambari Quicklink. You should see something similar to this:
After you access the Zeppelin UI, click the blue login button in the upper right corner of the interface. You can login using the default username and password of admin . After you login to Zeppelin, click the admin button in the upper right corner of the interface. This will expose the options menu. You should see something similar to this:
Click on the Interpreter link in the menu. This will display all of the configured interpreters. Find the spark2 interpreter. You can see the default setting for zeppelin.pyspark.python is set to python . This will use whichever Python is found in the path. You should see something similar to this:
We will need to change this to /opt/anaconda/bin/python which is where we have Anaconda Python installed. Click on the edit button and change zeppelin.pyspark.python to /opt/anaconda/bin/python . You should see something similar to this:
Now we can click the blue save button at the bottom. The configuration changes are now saved, but we need to restart the interpreter for the changes to take affect. Click on the restart button to restart the interpreter.
Create Zeppelin Notebook
Now that our spark2 interpreter configuration has been updated, we can create a notebook to test Anaconda + TensorFlow. Click on the Notebook menu. You should see something similar to this:
Click on the Create new note link. You can give the notebook any descriptive name you like. Select spark2 as the default interpreter. You should see something similar to this:
Your notebook will start with a blank paragraph. For the first paragraph, let's test the version of Spark we are using. Enter the following in the first paragraph:
%spark2.pyspark
sc.version
Now click the run button for the paragraph. You should see something similar to this:
u'2.1.0.2.6.0.3-8'
As you can see, we are using Spark 2.1 Now in the second paragraph, we'll test the version of Python. We already know the command line verison is 2.7.13. Enter the following in the second paragraph:
%spark2.pyspark
import sys
print sys.version_info
Now click the run button for the paragraph. You should see something similar to this:
sys.version_info(major=2, minor=7, micro=13, releaselevel='final', serial=0)
As you can see, we are runnig Python version 2.7.13.
Now we can test TensorFlow. Enter the following in the third paragraph:
%spark2.pyspark
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
a = tf.constant(10)
b = tf.constant(32)
print(sess.run(a + b))
This simple code comes from the TensorFlow website: [TensorFlow] (https://www.tensorflow.org/versions/r0.10/get_started/os_setup#anaconda_installation). Now click the run button for the paragraph. You may see some warning messages the first time you run it, but you should also see the following output:
Hello, TensorFlow!
42
As you can see, TensorFlow is working from Zeppelin which is using Spark 2.1 and Anaconda. If everything works properly, your notebook should look something similar this:
Admittedly this example is very basic, but it demonstrates the components are working together. For next steps, try running other TensorFlow code. Here are some examples you can work with: GitHub.
Review
If you have successfully followed along with this tutorial, you should have deployed an HDP 2.6 cluster in the cloud with Anaconda installed under /opt/anaconda and added the TensorFlow Python modules using a Cloudbreak recipe. You should have created a Zeppelin notebook which uses Anaconda Python, Spark 2.1 and TensorFlow.
... View more
Labels:
05-23-2017
11:16 PM
1 Kudo
This tutorial is part two of a two-part series. In this tutorial, we'll verify Spark 2.1 functionality using Zeppelin on an HDP 2.6 cluster deployed using Cloudbreak. The first tutorial covers using Cloudbreak to deploy the cluster. You can find the first tutorial here: HCC Article
Prerequisites
You should already have completed part one of this tutorial series and already have an Cloudbreak HDP 2.6 with Spark 2.1 cluster running.
Scope
This tutorial was tested in the following environment:
Cloudbreak 1.14.4
AWS EC2
HDP 2.6
Spark 2.1
Zeppelin 0.7
Steps
Login into Ambari
As mentioned in the prerequisites, you should already have a cluster built using Cloudbreak. Click on the cluster summary box in the Cloudbreak UI to display the cluster details. Now click on the link to your Ambari cluster. You may see something similar to this:
Your screen may vary depending on your browser of choice. I'm using Chrome. This warning is because we are using self-signed certificates which are not trusted. Click on the ADVANCED link. You should see something similar to this:
Click on the Proceed link to open the Ambari login screen. You should be able to login to Ambari using the username and password admin .
Login to Zeppelin
Now click on the Zeppelin component in the component status summary. You should see something similar to this:
Click on the Quicklinks link. You should see something similar to this:
Click on the Zeppelin UI link. This will load Zeppelin in a new browser tab. You should see something similar to this:
You should notice the blue Login button in the upper right corner of the Zeppelin UI. Click on this button. You should see something similar to this:
You should be able to login to Zeppelin using the username and password admin . Once you login, you should see something similar to this:
Load Getting Started Notebook
Now let's load the Apache Spark in 5 Minutes notebook by clicking on the Getting Started link. You should see something similar to this:
Click on the Apache Spark in 5 Minutes notebook. You should see something similar to this:
This is showing you the Zeppelin interpreters associated with this notebook. As you can see, the spark2 and livy2 interpreters are enabled. Click the blue Save button. You should see something similar to this:
This notebook defaults to using the Spark 2.x interpreter. You should be able to run the paragraphs without any changes. Scroll down the the notebook paragraph called Verify Spark Version . Click the play button on this paragraph. You should see something similar to this:
You should notice the Spark version is 2.1.0.2.6.0.3-8 . This confirms we are using Spark 2.1. It also confirms that Zeppelin is able to properly interact with Spark 2 on our HDP 2.6 cluster built with Cloudbreak. Try running the next two paragraphs. These paragraphs download a json file form github and then moves it to HDFS on our cluster. Now run the Load data into a Spark DataFrame paragraph. You should see something similar to this:
As you can see, the DataFrame should be properly loaded from the json file.
Next Steps
Try running the remaining paragraphs to ensure everything is working ok. For an extra challenge, try running some of the other Spark 2 notebooks that are included. You can also attempt to modify the Spark 1.6 notebooks to work with Spark 2.1.
Review
If you have successfully followed along with this tutorial, you should have been able to confirm Spark 2.1 works on our HDP 2.6 cluster deployed with Cloudbreak.
... View more
05-23-2017
09:41 PM
2 Kudos
This tutorial will walk you through the process of using Cloudbreak to deploy an HDP 2.6 cluster with Spark 2.1. We'll copy and edit the existing hdp-spark-cluster blueprint which deploys Spark 1.6 to create a new blueprint which installs Spark 2.1. This tutorial is part one of a two-part series. The second tutorial walks you through using Zeppelin to verify the Spark 2.1 installation. You can find that tutorial here: HCC Article
Prerequisites
You should already have a Cloudbreak v1.14.0 environment running. You can follow this article to create a Cloudbreak instance using Vagrant and Virtualbox: HCC Article
You should already have updated Cloudbreak to support deploying HDP 2.6 clusters. You can follow this article to enable that functionality: HCC Article
Scope
This tutorial was tested in the following environment:
Cloudbreak 1.14.4
AWS EC2
HDP 2.6
Spark 2.1
Steps
Create Blueprint
Before we can deploy a Spark 2.1 cluster using Cloudbreak, we need to create a blueprint that specifies Spark 2.1. Cloudbreak ships with 3 blueprints out of the box:
hdp-small-default: basic HDP cluster with Hive and HBase
hdp-spark-cluster: basic HDP cluster with Spark 1.6
hdp-streaming-cluster: basic HDP cluster with Kafka and Storm
We will use the hdp-spark-cluster as our base blueprint and edit it to deploy Spark 2.1 instead of Spark 1.6.
Click on the manage blueprints section of the UI. Click on the hdp-spark-cluster blueprint. You should see something similar to this:
Click on the blue copy & edit button. You should see something similar to this:
For the Name , enter hdp26-spark21-cluster . This tells us the blueprint is for an HDP 2.6 cluster using Spark 2.1. Enter the same information for the Description . You should see something similar to this:
Now, we need to edit the JSON portion of the blueprint. We need to change the Spark 1.6 components to Spark 2.1 components. We don't need change where they are deployed. The following entries within the JSON are for Spark 1.6:
"name": "SPARK_CLIENT"
"name": "SPARK_JOBHISTORYSERVER"
"name": "SPARK_CLIENT"
We will replace SPARK with SPARK2 . These entries should look as follows:
"name": "SPARK2_CLIENT"
"name": "SPARK2_JOBHISTORYSERVER"
"name": "SPARK2_CLIENT"
NOTE: There are two entries for SPARK_CLIENT. Make sure you change both.
We are going to add an entry for the LIVY component. We will add it to the same node as the SPARK_JOBHISTORYSERVER . We are also going to add an entry for the SPARK2_THRIFTSERVER component. We will add it to the same node as the SPARK_JOBHISTORYSERVER . Let's add those two entries just below SPARK2_CLIENT in the host_group_master_2 section.
Change the following:
{
"name": "SPARK2_JOBHISTORYSERVER"
},
{
"name": "SPARK2_CLIENT"
},
to this:
{
"name": "SPARK2_JOBHISTORYSERVER"
},
{
"name": "SPARK2_CLIENT"
},
{
"name": "SPARK2_THRIFTSERVER"
},
{
"name": "LIVY2_SERVER"
},
We also need to update the blueprint_name to hdp26-spark21-cluster and the stack_version to 2.6 . you should have something similar to this:
"Blueprints": {
"blueprint_name": "hdp26-spark21-cluster",
"stack_name": "HDP",
"stack_version": "2.6"
}
If you prefer, you can copy and paste the following blueprint JSON:
{
"host_groups": [
{
"name": "host_group_client_1",
"configurations": [],
"components": [
{
"name": "ZOOKEEPER_CLIENT"
},
{
"name": "PIG"
},
{
"name": "OOZIE_CLIENT"
},
{
"name": "HBASE_CLIENT"
},
{
"name": "HCAT"
},
{
"name": "KNOX_GATEWAY"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "FALCON_CLIENT"
},
{
"name": "TEZ_CLIENT"
},
{
"name": "SPARK2_CLIENT"
},
{
"name": "SLIDER"
},
{
"name": "SQOOP"
},
{
"name": "HDFS_CLIENT"
},
{
"name": "HIVE_CLIENT"
},
{
"name": "YARN_CLIENT"
},
{
"name": "METRICS_COLLECTOR"
},
{
"name": "MAPREDUCE2_CLIENT"
}
],
"cardinality": "1"
},
{
"name": "host_group_master_3",
"configurations": [],
"components": [
{
"name": "ZOOKEEPER_SERVER"
},
{
"name": "APP_TIMELINE_SERVER"
},
{
"name": "TEZ_CLIENT"
},
{
"name": "HBASE_MASTER"
},
{
"name": "HBASE_CLIENT"
},
{
"name": "HDFS_CLIENT"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "SECONDARY_NAMENODE"
}
],
"cardinality": "1"
},
{
"name": "host_group_slave_1",
"configurations": [],
"components": [
{
"name": "HBASE_REGIONSERVER"
},
{
"name": "NODEMANAGER"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "DATANODE"
}
],
"cardinality": "6"
},
{
"name": "host_group_master_2",
"configurations": [],
"components": [
{
"name": "ZOOKEEPER_SERVER"
},
{
"name": "ZOOKEEPER_CLIENT"
},
{
"name": "PIG"
},
{
"name": "MYSQL_SERVER"
},
{
"name": "HIVE_SERVER"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "SPARK2_JOBHISTORYSERVER"
},
{
"name": "SPARK2_CLIENT"
},
{
"name": "SPARK2_THRIFTSERVER"
},
{
"name": "LIVY2_SERVER"
},
{
"name": "TEZ_CLIENT"
},
{
"name": "HBASE_CLIENT"
},
{
"name": "HIVE_METASTORE"
},
{
"name": "ZEPPELIN_MASTER"
},
{
"name": "HDFS_CLIENT"
},
{
"name": "YARN_CLIENT"
},
{
"name": "MAPREDUCE2_CLIENT"
},
{
"name": "RESOURCEMANAGER"
},
{
"name": "WEBHCAT_SERVER"
}
],
"cardinality": "1"
},
{
"name": "host_group_master_1",
"configurations": [],
"components": [
{
"name": "ZOOKEEPER_SERVER"
},
{
"name": "HISTORYSERVER"
},
{
"name": "OOZIE_CLIENT"
},
{
"name": "NAMENODE"
},
{
"name": "OOZIE_SERVER"
},
{
"name": "HDFS_CLIENT"
},
{
"name": "YARN_CLIENT"
},
{
"name": "FALCON_SERVER"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "MAPREDUCE2_CLIENT"
}
],
"cardinality": "1"
}
],
"Blueprints": {
"blueprint_name": "hdp26-spark21-cluster",
"stack_name": "HDP",
"stack_version": "2.6"
}
}
Once you have all of the changes in place, click the green create blueprint button.
Create Security Group
We need to create a new security group to use with our cluster. By default, the existing security groups only allow ports 22, 443, and 9443. As part of this tutorial, we will use Zeppelin to test Spark 2.1. We'll create a new security group that opens all ports to our IP address.
Click on the manage security groups section of the UI. You should see something similar to this:
Click on the green create security group button. You should see something similar to this:
First you need to select the appropriate cloud platform. I'm using AWS, so that is what I selected. We need to provide a unique name for our security group. I used all-ports-my-ip . You should use something descriptive. Provide a helpful description as well. Now we need to enter our personal IP address CIDR. I am using #.#.#.#/32 ; your IP address will obviously be different. You need to enter the port range. There is a known issue in Cloudbreak that prevents you from using 0-65356 , so we'll use 1-65356 . For the protocol, use tcp . Once you have everything entered, you should see something similar to this:
Click the green Add Rule button to add this rule to our security group. You can add multiple rules, but we have everything covered with our single rule. You should see something similar to this:
If everything looks good, click the green create security group button. This will create our new security group. You should see something like this:
Create Cluster
Now that our blueprint has been created and we have an new security group, we can begin building the cluster. Ensure you have selected the appropriate credential for your cloud environment. Then click the green create cluster button. You should see something similar to this:
Give your cluster a descriptive name. I used spark21test , but you can use whatever you like. Select an appropriate cloud region. I'm using AWS and selected US East (N. Virginia) , but you can use whatever you like. You should see something similar to this:
Click on the Setup Network and Security button. You should see something similar to this:
We are going to keep the default options here. Click on the Choose Blueprint button. You should see something similar to this:
Expand the blueprint dropdown menu. You should see the blueprint we created before, hdp26-spark21-cluster . Select the blueprint. You should see something similar to this:
You should notice the new security group is already selected. Cloudbreak did not automatically figure this out. The instance templates and security groups are selected alphabetically be default.
Now we need to select a node on which to deploy Ambari. I typically deploy Ambari on the master1 server. Check the Ambari check box on one of the master servers. If everything looks good, click on the green create cluster , You should see something similar to this:
Once the cluster has finished building, you can click on the arrow for the cluster we created to get expanded details. You should see something similar to this:
Verify Versions
Once the cluster is fully deployed, we can verify the versions of the components. Click on the Ambari link on the cluster details page. Once you login to Ambari, you should see something similar to this:
You should notice that Spark2 is shown in the component list. Click on Spark2 in the list. You should see something similar to this:
You should notice that both the Spark2 Thrift Server and the Livy2 Server have been installed. Now lets check the overall cluster verions. Click on the Admin link in the Ambari menu and select Stacks and Versions . Then click on the Versions tab. You should see something similar to this:
As you can see, HDP 2.6.0.3 was deployed.
Review
If you have successfully followed along with this tutorial, you should know how to create a new security group and blueprint. The blueprint allows you to deploy HDP 2.6 with Spark 2.1. The security group allows you to access all ports on the cluster from your IP address. Follow along in part 2 of the tutorial series to use Zeppelin to test Spark 2.1.
... View more
05-18-2017
03:00 PM
6 Kudos
Prerequisites
You should already have a Cloudbreak v1.14.4 environment running. You can follow this article to create a Cloudbreak instance using Vagrant and Virtualbox: HCC Article
You should already have credentials created in Cloudbreak for deploying on AWS (or Azure).
Scope
This tutorial was tested in the following environment:
macOS Sierra (version 10.12.4)
Cloudbreak 1.14.4
AWS EC2
NOTE: Cloudbreak 1.14.0 (TP) had a bug which caused HDP 2.6 clusters installs to fail. You should upgrade your Cloudbreak deployer instance to 1.14.4.
Steps
Create application.yml file
UPDATE 05/24/2017: The creation of a custom application.yml file is not required with Cloudbreak 1.14.4. This version of Cloudbreak includes support for HDP 2.5 and HDP 2.6. This step remains for educational purposes for future HDP updates.
You need to create an application.yml file in the etc directory within your Cloudbreak deployment directory. This file will contain the repo information for HDP 2.6. If you followed my tutorial linked above, then your Cloudbreak deployment directory should be /opt/cloudbreak-deployment . If you are using a Cloudbreak instance on AWS or Azure, then your Cloudbreak deployment directory is likely /var/lib/cloudbreak-deployment/ .
Edit your <cloudbreak-deployment>/etc/application.yml file using your favorite editor. Copy and paste the following in the file:
cb:
ambari:
repo:
version: 2.5.0.3-7
baseurl: http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.5.0.3
gpgkey: http://public-repo-1.hortonworks.com/ambari/centos6/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
database:
vendor: embedded
host: localhost
port: 5432
name: postgres
username: ambari
password: bigdata
</p>
<p>
hdp:
entries:
2.5:
version: 2.5.0.1-210
repoid: HDP-2.5
repo:
stack:
repoid: HDP-2.5
redhat6: http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.5.5.0
redhat7: http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.5.5.0
util:
repoid: HDP-UTILS-1.1.0.21
redhat6: http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos6
redhat7: http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos7
2.6:
version: 2.6.0.0-598
repoid: HDP-2.6
repo:
stack:
repoid: HDP-2.6
redhat6: http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.6.0.3
redhat7: http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.6.0.3
util:
repoid: HDP-UTILS-1.1.0.21
redhat6: http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos6
redhat7: http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos7
Start Cloudbreak
Once you have created your application.yml file, you can start Cloudbreak.
$ cbd start
NOTE: It may take a couple of minutes before Cloudbreak is fully running.
Create HDP 2.6 Blueprint
To create an HDP 2.6 cluster, we need to update our blueprint to specify HDP 2.6. On the main Cloudbreak UI, click on manage blueprints . You should see something similar to this:
You should see 3 default blueprints. We are going to use the hdp-small-default blueprint as our base. Click on the hdp-small-default blueprint name. You should see something similar to this:
Now click on the blue copy & edit button. You should see something similar to this:
For the Name , you should enter something unqiue and descriptive. I suggest hdp26-small-default . For the Description , you can enter the same information. You should see something similar to this:
Now we need to edit the JSON portion of the blueprint. Scroll down to the bottom of the JSON. You should see something similar to this:
Now edit the blueprint_name value to be hdp26-small-default and edit the stack_version to be 2.6 . You should see something similar to this:
Now click on the green create blueprint button. You should see the new blueprint visible in the list of blueprints.
Create HDP 2.6 Small Default Cluster
Now that our blueprint has been created, we can create a cluster and select this blueprint to install HDP 2.6. Select the appropriate credential for your Cloud environment. Click on the create cluster button. You should see something similar to this:
Provide a unique, but descriptive Cluster Name . Ensure you select an appropriate Region . I chose hdp26test as my cluster name and I'm using the US East region:
Now advanced to the next step by clicking on Setup Network and Security . You should see something similar to this:
We don't need to make any changes here, so click on the Choose Blueprint button. You should see something similar to this:
In the Blueprint dropdown, you should see the blueprint we created. Select the hdp26-small-default blueprint. You should see something similar to this:
You need to select which node Ambari will run on. I typically select the master1 node. You should see something similar to this:
Now you can click on the Review and Launch button. You should see something similar to this:
Verify the information presented. If everything looks good, click on the create and start cluster button . Once the cluster build process has started, you should see something similar to this:
Verify HDP Version
Once the cluster has finished building, you can click on the cluster in the Cloudbreak UI. You should see something similar to this:
Click on the Ambari link to load Ambari. Login using the default username and password of admin . Now click on the Admin link in the menu. You should see something similar to this:
Click on the Stack and Versions link. You should see something similar to this:
You should notice that HDP 2.6.0.3 has been deployed.
Review
If you have successfully followed along with this tutorial, you should know how to create/update /etc/application.yml to add specific Ambair and HDP repositories. You should have successfully created an updated blueprint and deployed HDP 2.6 on your cloud of choice.
... View more
05-13-2017
01:57 AM
5 Kudos
Objectives
This tutorial will walk you through the process of using Cloudbreak recipes to install Anaconda on an your HDP cluster during cluster provisioning. This process can be used to automate many tasks on the cluster both pre-install and post-install.
Prerequisites
You should already have a Cloudbreak v1.14.0 environment running. You can follow this article to create a Cloudbreak instance using Vagrant and Virtualbox: HCC Article
You should already have credentials created in Cloudbreak for deploying on AWS (or Azure).
Scope
This tutorial was tested in the following environment:
macOS Sierra (version 10.12.4)
Cloudbreak 1.14.0 TP
AWS EC2
Anaconda 2.7.13
Steps
Create Recipe
Before you can use a recipe during a cluster deployment, you have to create the recipe. In the Cloudbreak UI, look for the "mange recipes" section. It should look similar to this:
If this is your first time creating a recipe, you will have 0 recipes instead of the 2 recipes show in my interface.
Now click on the arrow next to manage recipes to display available recipes. You should see something similar to this:
Now click on the green create recipe button. You should see something similar to this:
Now we can enter the information for our recipe. I'm calling this recipe anaconda . I'm giving it the description of Install Anaconda . You can choose to install Anaconda as either pre-install or post-install. I'm choosing to do the install post-install. This means the script will be run after the Ambari installation process has started. So choose the Execution Type of POST . Choose Script so we can copy and paste the shell script. You can also specify a file to upload or a URL (gist for example). Our script is very basic. We are going to download the Anaconda install script, then run it in silent mode. Here is the script:
#!/bin/bash
wget https://repo.continuum.io/archive/Anaconda2-4.3.1-Linux-x86_64.sh
bash ./Anaconda2-4.3.1-Linux-x86_64.sh -b -p /opt/anaconda
When you have finished entering all of the information, you should see something similar to this:
If everything looks good, click on the green create recipe button.
After the recipe has been created, you should see something similar to this:
Create a Cluster using a Recipe
Now that our recipe has been created, we can create a cluster that uses the recipe. Go through the process of creating a cluster up to the Choose Blueprint step. This step is when you select the recipe you want to use. The recipes are not selected by default; you have to select the recipes you wish to use. You specify recipes for 1 or more host groups. This allows you to run different recipes across different host groups (masters, slaves, etc). You can also select multiple recipes.
We want to use the ```hdp-small-default``` blueprint. This will create a basic HDP cluster.
If you select the anaconda recipe, you should see something similar to this:
[Select Recipe]( ) In our case, we are going to run the recipe on every host group. If you intend to use something like Anaconda across the cluster, you should install it on at least the slave nodes and the client nodes.
After you have selected the recipe for the host groups, click the Review & Launch button, then launch the cluster. As the cluster is building, you should see a message in the Cloudbreak UI that indicates the recipe is running. When that happens, you will see something similar to this:
Cloudbreak will create logs for each recipe that runs on each host. These logs are located at /var/log/recipe and have the name of the recipe and whether it is pre or post install. For example, our recipe log is called post-anaconda.log . You can tail this log file to following the execution of the script.
NOTE: Post install scripts won't be executed until the Ambari server is installed and the cluster is building. You can always monitor the /var/log/recipe directory on a node to see when the script is being executed. The time it takes to run the script will vary depending on the cloud environment and how long it takes to spin up the cluster.
On your cluster, you should be able to see the post-install log:
$ ls /var/log/recipes
post-anaconda.log post-hdfs-home.log
Once the install process is complete, you should be able to verify that Anaconda is installed. You need to ssh into one of the cloud instances. You can get the public ip address from the Cloudbreak UI. You will login using the corresponding private key to the public key you entered when you created the Cloudbreak credential. You should login as the cloudbreak user. You should see something similar to this:
$ ssh -i ~/Downloads/keys/cloudbreak_id_rsa cloudbreak@#.#.#.#
The authenticity of host '#.#.#.# (#.#.#.#)' can't be established.
ECDSA key fingerprint is SHA256:By1MJ2sYGB/ymA8jKBIfam1eRkDS5+DX1THA+gs8sdU.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '#.#.#.#' (ECDSA) to the list of known hosts.
Last login: Sat May 13 00:47:41 2017 from 192.175.27.2
__| __|_ )
_| ( / Amazon Linux AMI
___|\___|___|
https://aws.amazon.com/amazon-linux-ami/2016.09-release-notes/
25 package(s) needed for security, out of 61 available
Run "sudo yum update" to apply all updates.
Amazon Linux version 2017.03 is available.
Once you are on the server, you can check the version of python:
$ /opt/anaconda/bin/python --version
Python 2.7.13 :: Anaconda 4.3.1 (64-bit)
Review
If you have successfully followed along with this tutorial, you should know how to create pre and post install scripts. You should have successfully deployed a cluster on either AWS or Azure with Anaconda installed under /opt/anaconda on the nodes you specified.
... View more
Labels: