About jwhitmore

jwhitmore · ‎02-01-2019

jwhitmore · ‎09-21-2018

The quickest way to launch a cluster with Cloudbreak is through the CLI. This allows you to specify all of the required settings and submit the request to Cloudbreak. The cluster will show up in Cloudbreak just like it does when launched via the wizard. The quickest way to generate the JSON template is to set up a cluster through the wizard. Generating the JSON Template from Cloudbreak GUI The first way you can create the template is from the Create Cluster wizard. In the Cloudbreak GUI, walk through all of the steps to configure a cluster. At the end, before you click "Create Cluster" you can click "Show CLI Command". The second way is to go the a running cluster in the Cloudbreak GUI. Click the "Actions" button and you can click "Show CLI Command". The CLI command dialog comes up with the JSON template in the top text box and the Cloudbreak CLI command in the lower text box. Click the "Copy the JSON" button and the paste it into a text file. Copy the CLI command to a note for later use. Preparing the JSON template for creating a cluster The JSON template does not have the sensitive values that were used in creating the cluster and they will have to be filled in. Most notably, the password in the ambari section. In the copied JSON, it will be blank, but you must set it before a cluster can be created with the JSON file. "password": "SuperSecretL0ngPassword!!one", You may also want to change input values for variables in a dynamic blueprint. The HA blueprint discussed in "Cloudbreak blueprint for high-availability Hadoop and Hive" is a dynamic blueprint, allowing you to change the value of dfs.namespace for each cluster you create. Set this in the inputs section. In this example, it is set to "hdp301". "inputs": { "dfs.nameservices": "hdp301" }, Many of the configurations for HA in blueprints do not pass validation and therefore require a setting in the blueprint section. If so you will see an error when you try to create a cluster from the CLI. This could be for HIVE_SERVER, NAMENODE or other components. ERROR: status code: 500, message: Incorrect number of 'HIVE_SERVER' components are in '[master_nn1, master_nn2]' hostgroups: count: 2, min: 1 max: 1 If you get an error, you will have to add "validateBlueprint": false, in the ambari section where the blueprint name is specified: "ambari": { "blueprintName": "hdp30-ha", "validateBlueprint": false, "userName": "admin", Otherwise you may get a validation error when you Creating the cluster with the CLI Once you have the JSON template ready, you can create a cluster by just running the 'cb cluster create' command. This will return quickly with a successful return code and you will see the cluster in the Cloudbreak GUI. The cluster will be built just as it is when you click create cluster in the GUI wizard. $ ../bin/cb cluster create --cli-input-json ../clusters/cb-hdp301-ha.json --name hdp301-ha $ echo $? 0 Changing settings in the JSON file Now that you have a JSON template, you will be able to see where you can change configurations such as the instanceType or nodeCount for each host group, the HDP version and the repo paths to customize each new cluster you create.

jwhitmore · ‎09-20-2018

Adding HBase HA to a blueprint with high availability is a very straightforward process. The blueprint you start from has to include a zookeeper quorum, like the blueprint that provides HA namenodes and resourcemanagers that is discussed in "Cloudbreak blueprint for high-availability Hadoop and Hive". We will add HBase Master services to two master host groups, HBase Regionservers to the datanode host group, and an hbase-site configuration section. Required Zookeeper Quorum To properly specify the zookeepers to be used, you will need at least three separate host groups with cardinality of 1 to provide the zookeepers. In the blueprint provided in the previous article, those hostgroups are: master_mgmt, master_nn1 and master_nn2. In that blueprint, master_mgmt contains the Ambari, metrics and other management services, while master_nn1 and master_nn2 contain the redundant namenodes and resourcemanagers. In this example, we will add redundant HBase masters to those host groups as well. Adding HBase Services to Host Groups The following blueprint diagram shows the additional HBase services in the Cloudbreak blueprint list view: Addition to components section of the master_mgmt host group: { "name": "HBASE_CLIENT" }, Addition to the components section of master_nn1 and master_nn2 host groups: { "name": "HBASE_MASTER" }, { "name": "HBASE_CLIENT" }, Addition to the components section of the datanode host group: { "name": "HBASE_CLIENT" }, { "name": "HBASE_REGIONSERVER" }, Adding hbase-site configuration section A new section is required in the "Configurations" section of the blueprint to define the zookeeper settings for HBase. This defines the Zookeeper servers to be used, which we looked at in a previous section. It also configures a couple of other zookeeper settings for HBase. { "hbase-site": { "properties": { "hbase.zookeeper.property.clientPort": "2181", "hbase.zookeeper.quorum": "%HOSTGROUP::master_nn1%,%HOSTGROUP::master_nn2%,%HOSTGROUP::master_mgmt%", "hbase.zookeeper.useMulti": "true" } } }, That's all that's needed to add HBase HA to a blueprint that already has a Zookeeper quorum. Attached is a blueprint (cb-hdp26-hahbase-blueprint.json) that provides Namenode HA, Resourcemanager HA, Hive Metastore HA and Hive Server HA.

jwhitmore · ‎03-29-2018

As discussed in my previous article Using Pre-defined Security Groups with Cloudbreak, the preferred method for managing security for hostgroups in Cloudbreak is to use the native interface from your cloud provider. For Openstack, one of the easiest ways to do it is to use the openstack CLI. My Openstack environment uses a self-signed certificate and is only accessible through a VPN, so I have to use the --insecure flag to ignore the certificate errors. Before you can use the CLI commands, you have to know your Openstack login parameters. This includes username and password, but also the URL of the security endpoint, project name, etc. For mine, I have a script that puts these values in environment variables, but they can be specified on the openstack command line if desired. export OS_USERNAME=john.whitmore export OS_PASSWORD='xxxxxxxxxxxxxx' export OS_SYSTEM_SCOPE=Project export OS_USER_DOMAIN_NAME=Default export OS_AUTH_URL=http://###.###.###.###:5000/v3 export OS_IDENTITY_API_VERSION=3 export OS_PROJECT_NAME=Tenant1 export OS_PROJECT_DOMAIN_NAME=Default The first step is to create the security group. openstack --insecure security group create hdp-sec-mgmt +-----------------+---------------------------------------------------------------------------------+ | Field | Value | +-----------------+---------------------------------------------------------------------------------+ | created_at | None | | description | hdp-sec-mgmt | | id | 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 | | name | hdp-sec-mgmt | | project_id | ad120b0e0e3c41f5b621f7149c876390 | | revision_number | None | | rules | direction='egress', ethertype='IPv4', id='64659d92-e5f8-4689-981f-391217d64674' | | | direction='egress', ethertype='IPv6', id='d8273584-66d0-4575-a8c6-a883e4112cb7' | | updated_at | None | +-----------------+---------------------------------------------------------------------------------+ This creates a new security group with default outbound access rules. It will create with the requested name even if there is one by the same name. Therefore, we will create the ingress rules using the id that was returned, because that is unique. For the same reason, when you use the rule, Cloudbreak shows the id so that you can be sure you are using the rule you expect even if there are duplicate names. Next you add your ingress rules. The default for the create rule subcommand is --ingress and --tcp, therefore I don't have to specify those for each line. openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 60200 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 39915 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 6188 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 3888 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 8080 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 8886 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 22 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 8440 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 5432 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 1080 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 8441 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 4505 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 4506 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 443 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 61181 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 61310 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 8670 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 32768 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 8480 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 32769 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 9443 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 36677 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 2181 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 8485 openstack --insecure security group rule create 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 --dst-port 18886 Each line outputs information about the rule that was created +-------------------+--------------------------------------+ | Field | Value | +-------------------+--------------------------------------+ | created_at | None | | description | | | direction | ingress | | ether_type | IPv4 | | id | fbe378e6-5adc-44dd-b695-3769d27d228e | | name | None | | port_range_max | 8080 | | port_range_min | 8080 | | project_id | ad120b0e0e3c41f5b621f7149c876390 | | protocol | tcp | | remote_group_id | None | | remote_ip_prefix | 0.0.0.0/0 | | revision_number | None | | security_group_id | 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 | | updated_at | None | +-------------------+--------------------------------------+ You can list all of the rules in the security group to verify it. openstack --insecure security group rule list 0b9f6f3f-f5fd-431d-af31-a5c4efb53158 The listing of the rule at the end looks like this: +--------------------------------------+-------------+-----------+-------------+-----------------------+ | ID | IP Protocol | IP Range | Port Range | Remote Security Group | +--------------------------------------+-------------+-----------+-------------+-----------------------+ | 067cd1a9-d744-4fdc-aaaa-b2c2e148b525 | None | None | | None | | 07c68cbe-0acf-4a20-baef-e0d8940ea94c | None | None | | None | | 0b502d4f-596f-4e27-8451-9ca8d99dc4b7 | tcp | 0.0.0.0/0 | 6188:6188 | None | | 4053d474-6332-48bf-bf87-45f34355c6cd | tcp | 0.0.0.0/0 | 3888:3888 | None | | 49df8547-1723-4380-b548-7f74f32e2b71 | tcp | 0.0.0.0/0 | 8440:8440 | None | | 4cde8bc0-c52c-46c6-a506-b8e22654d3be | tcp | 0.0.0.0/0 | 32768:32768 | None | | 4e4f5e4c-ef9b-472b-9901-37a24c8d7571 | tcp | 0.0.0.0/0 | 8485:8485 | None | | 5cbe51a4-b82f-4828-bac7-2399d600ecae | tcp | 0.0.0.0/0 | 4505:4505 | None | | 60e0a5f3-6826-4274-b87d-2fa614cc504e | tcp | 0.0.0.0/0 | 60200:60200 | None | | 63803572-419a-472b-ad09-c6568f7f3981 | tcp | 0.0.0.0/0 | 39915:39915 | None | | fbe378e6-5adc-44dd-b695-3769d27d228e | tcp | 0.0.0.0/0 | 8080:8080 | None | | 8bca6668-47f4-4089-a028-a1a95620cfe4 | tcp | 0.0.0.0/0 | 9443:9443 | None | | 96caddc4-6a99-4be1-995d-282c7d6e2173 | tcp | 0.0.0.0/0 | 61181:61181 | None | | 9fa5764a-4bab-4d7b-8ebb-239f80d3ceb1 | tcp | 0.0.0.0/0 | 22:22 | None | | a1eca812-e485-4cae-8bef-a1cad525f86b | tcp | 0.0.0.0/0 | 4506:4506 | None | | a580c721-bd45-480d-8413-ae15442b5557 | tcp | 0.0.0.0/0 | 443:443 | None | | a6f74c6e-fc96-4314-a18f-60af8c5d9bde | tcp | 0.0.0.0/0 | 5432:5432 | None | | c072ebef-19ec-403f-9505-547cff4f2b05 | tcp | 0.0.0.0/0 | 2181:2181 | None | | caff450a-1c7c-405b-bc8e-49d2d815566d | tcp | 0.0.0.0/0 | 32769:32769 | None | | cd0bf21c-f46c-44bb-bf9f-2b0f119177fa | tcp | 0.0.0.0/0 | 18886:18886 | None | | cf3e99fe-758f-44c2-800b-cddeb1607183 | tcp | 0.0.0.0/0 | 8441:8441 | None | | d5191190-b3f9-4dde-b3aa-cc615afb78e3 | tcp | 0.0.0.0/0 | 1080:1080 | None | | d733e203-5b41-492b-ba79-997be1094e41 | tcp | 0.0.0.0/0 | 61310:61310 | None | | d9fbcefa-223c-4f3d-a4d1-d6d990ddabf5 | tcp | 0.0.0.0/0 | 8670:8670 | None | | f414bfb0-fc43-43d6-96e6-a70dd60351c9 | tcp | 0.0.0.0/0 | 8886:8886 | None | | f823d654-d04c-4d5b-96c0-ee3e12bf57a7 | tcp | 0.0.0.0/0 | 36677:36677 | None | | fa86a862-7223-43d4-8b49-ed6365ab1c91 | tcp | 0.0.0.0/0 | 8480:8480 | None | +--------------------------------------+-------------+-----------+-------------+-----------------------+ If you want to limit the access instead of taking the default 0.0.0.0/0, you can add --remote-ip <ip-address CIDR> to each line. I think you will find this much quicker than going through the GUI for every added port.

jwhitmore · ‎03-08-2018

The preferred method to manage network security with Cloudbreak is to use existing security groups in the target cloud. You manage the security groups in the target cloud's native tools and Cloudbreak adds the nodes to those existing groups. This allows the access rules in the security group to be centrally managed eliminating the need to propagate changes to existing clusters. Each cluster that gets created will have the same standard access rules, which can be very granular to provide only the access that is required. This also allows separation of the security management duties from the cluster administration duties if an organization requires it. Network security groups - Microsoft Azure Each cloud has its own interface for managing security groups. In Azure, for instance, the access rules are in network security groups and each group is available in an individual location. Here we see four network security groups that I've created in Azure's South Central US location for use with Cloudbreak. I've created these to follow the hostgroups in the blueprint that I'm using. The services that are running on the management master include Ambari, Ambari Metrics and other cluster management tools. They have different access requirements that the namenodes running on the and the security groups reflect those differences. Configuring Existing Security Groups - Hortonworks Cloudbreak Cloudbreak reads the available security groups from the cloud at the time that the cluster is being configured. In this example, you can see that the same groups created in the Azure dashboard are shown in the existing security groups dropdown list below. All that is required is to select the proper group for each hostgroup and Cloudbreak will put the VMs it creates into those security groups. Note that in this example I've decided to have one security group for all of the access needed to any services on the master nodes. This will allow services to be moved between master nodes without having to change the groups. Once the cluster is built, you can see the VMs that are built. Clicking on the VM name will take you to the cloud's interface, allowing you to see the settings that were applied. Clicking on networking shows you the security group being applied. VM Networking - Microsoft Azure In the networking section of the VM, you see the Network Security Group that was requested was applied by Cloudbreak. Managing network security groups - Microsoft Azure Creating a network security group in the native cloud tool is pretty straightforward. Here's what it looks like in Azure: Once the Network security group is created, you can define the ports and IP address ranges for access. Changes made here will be effective for any VMs that have already been provisioned as well as any new VMs provisioned to use the same rules.

jwhitmore · ‎02-22-2018

Hi @Ramu Valleti, I have been using Cloudbreak, which takes care of creating the hosts in the cloud and them mapping them to the hostgroups in the blueprint. I may try this blueprint with Ambari sometime next week. I will comment with my results. Thanks, John

jwhitmore · ‎02-22-2018

This is a basic Ambari blueprint for clusters that implement high availability for HDFS, Yarn and Hive for HDP 2.6. This includes Namenode HA, Resourcemanager HA, Hive Metastore server HA and Hive Server HA. It does not implement a high-availability database for the Hive metastore, though, as that is done within the relational database backed itself. It has been developed and tested using Cloudbreak, but should work with Ambari as well. Hostgroup Layout Raw Blueprint JSON { "Blueprints": { "blueprint_name": "cb24-hdp26-ha", "stack_name": "HDP", "stack_version": "2.6" }, "settings": [ { "recovery_settings": [] }, { "service_settings": [ { "name": "HIVE", "credential_store_enabled": "false" } ] }, { "component_settings": [] } ], "host_groups": [ { "name": "master_mgmt", "components": [ { "name": "METRICS_COLLECTOR" }, { "name": "METRICS_GRAFANA" }, { "name": "ZOOKEEPER_SERVER" }, { "name": "JOURNALNODE" }, { "name": "INFRA_SOLR" }, { "name": "INFRA_SOLR_CLIENT" }, { "name": "METRICS_MONITOR" }, { "name": "ZOOKEEPER_CLIENT" }, { "name": "HDFS_CLIENT" }, { "name": "YARN_CLIENT" }, { "name": "OOZIE_CLIENT" }, { "name": "MAPREDUCE2_CLIENT" }, { "name": "HIVE_CLIENT" }, { "name": "TEZ_CLIENT" } ], "cardinality": "1" }, { "name": "master_nn1", "components": [ { "name": "NAMENODE" }, { "name": "ZKFC" }, { "name": "RESOURCEMANAGER" }, { "name": "METRICS_MONITOR" }, { "name": "APP_TIMELINE_SERVER" }, { "name": "HIVE_METASTORE" }, { "name": "HIVE_SERVER" }, { "name": "HCAT" }, { "name": "WEBHCAT_SERVER" }, { "name": "ZOOKEEPER_SERVER" }, { "name": "JOURNALNODE" }, { "name": "HIVE_CLIENT" }, { "name": "HDFS_CLIENT" }, { "name": "YARN_CLIENT" }, { "name": "OOZIE_CLIENT" }, { "name": "ZOOKEEPER_CLIENT" }, { "name": "LIVY2_SERVER" }, { "name": "SPARK2_CLIENT" }, { "name": "MAPREDUCE2_CLIENT" }, { "name": "TEZ_CLIENT" } ], "cardinality": "1" }, { "name": "master_nn2", "components": [ { "name": "NAMENODE" }, { "name": "ZKFC" }, { "name": "RESOURCEMANAGER" }, { "name": "METRICS_MONITOR" }, { "name": "HISTORYSERVER" }, { "name": "HIVE_METASTORE" }, { "name": "HIVE_SERVER" }, { "name": "SLIDER" }, { "name": "PIG" }, { "name": "OOZIE_SERVER" }, { "name": "ZOOKEEPER_SERVER" }, { "name": "JOURNALNODE" }, { "name": "HIVE_CLIENT" }, { "name": "HDFS_CLIENT" }, { "name": "YARN_CLIENT" }, { "name": "OOZIE_CLIENT" }, { "name": "ZOOKEEPER_CLIENT" }, { "name": "SPARK2_JOBHISTORYSERVER" }, { "name": "SPARK2_CLIENT" }, { "name": "MAPREDUCE2_CLIENT" }, { "name": "TEZ_CLIENT" } ], "cardinality": "1" }, { "name": "datanode", "components": [ { "name": "HIVE_CLIENT" }, { "name": "TEZ_CLIENT" }, { "name": "SPARK2_CLIENT" }, { "name": "YARN_CLIENT" }, { "name": "OOZIE_CLIENT" }, { "name": "DATANODE" }, { "name": "METRICS_MONITOR" }, { "name": "NODEMANAGER" } ], "cardinality": "3+" } ], "configurations": [ { "core-site": { "properties": { "fs.trash.interval": "4320", "fs.defaultFS": "hdfs://mycluster", "hadoop.proxyuser.yarn.hosts": "%HOSTGROUP::master_nn1%,%HOSTGROUP::master_nn2%", "hadoop.proxyuser.hive.hosts": "%HOSTGROUP::master_nn1%,%HOSTGROUP::master_nn2%", "ha.zookeeper.quorum": "%HOSTGROUP::master_nn1%:2181,%HOSTGROUP::master_nn2%:2181,%HOSTGROUP::master_mgmt%:2181" } } }, { "hdfs-site": { "properties": { "dfs.namenode.safemode.threshold-pct": "0.99", "dfs.client.failover.proxy.provider.mycluster": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", "dfs.ha.automatic-failover.enabled": "true", "dfs.ha.fencing.methods": "shell(/bin/true)", "dfs.ha.namenodes.mycluster": "nn1,nn2", "dfs.namenode.http-address": "%HOSTGROUP::master_nn1%:50070", "dfs.namenode.http-address.mycluster.nn1": "%HOSTGROUP::master_nn1%:50070", "dfs.namenode.http-address.mycluster.nn2": "%HOSTGROUP::master_nn2%:50070", "dfs.namenode.https-address": "%HOSTGROUP::master_nn1%:50470", "dfs.namenode.https-address.mycluster.nn1": "%HOSTGROUP::master_nn1%:50470", "dfs.namenode.https-address.mycluster.nn2": "%HOSTGROUP::master_nn2%:50470", "dfs.namenode.rpc-address.mycluster.nn1": "%HOSTGROUP::master_nn1%:8020", "dfs.namenode.rpc-address.mycluster.nn2": "%HOSTGROUP::master_nn2%:8020", "dfs.namenode.shared.edits.dir": "qjournal://%HOSTGROUP::master_nn1%:8485;%HOSTGROUP::master_nn2%:8485;%HOSTGROUP::master_mgmt%:8485/mycluster", "dfs.nameservices": "mycluster" } } }, { "hive-site": { "properties": { "hive.metastore.uris": "thrift://%HOSTGROUP::master_nn1%:9083,thrift://%HOSTGROUP::master_nn2%:9083", "hive.exec.compress.output": "true", "hive.merge.mapfiles": "true", "hive.server2.tez.initialize.default.sessions": "true", "hive.server2.transport.mode": "http" } } }, { "webhcat-site": { "properties_attributes": {}, "properties": { "templeton.hive.properties": "hive.metastore.local=false,hive.metastore.uris=thrift://%HOSTGROUP::master_nn1%:9083\,thrift://%HOSTGROUP::master_nn2%:9083,hive.metastore.sasl.enabled=false" } } }, { "mapred-site": { "properties": { "mapreduce.job.reduce.slowstart.completedmaps": "0.7", "mapreduce.map.output.compress": "true", "mapreduce.output.fileoutputformat.compress": "true" } } }, { "yarn-site": { "properties": { "hadoop.registry.rm.enabled": "true", "hadoop.registry.zk.quorum": "%HOSTGROUP::master_nn1%:2181,%HOSTGROUP::master_nn2%:2181,%HOSTGROUP::master_mgmt%:2181", "yarn.log.server.url": "http://%HOSTGROUP::master_nn2%:19888/jobhistory/logs", "yarn.resourcemanager.address": "%HOSTGROUP::master_nn1%:8050", "yarn.resourcemanager.admin.address": "%HOSTGROUP::master_nn1%:8141", "yarn.resourcemanager.cluster-id": "yarn-cluster", "yarn.resourcemanager.ha.automatic-failover.zk-base-path": "/yarn-leader-election", "yarn.resourcemanager.ha.enabled": "true", "yarn.resourcemanager.ha.rm-ids": "rm1,rm2", "yarn.resourcemanager.hostname": "%HOSTGROUP::master_nn1%", "yarn.resourcemanager.hostname.rm1": "%HOSTGROUP::master_nn1%", "yarn.resourcemanager.hostname.rm2": "%HOSTGROUP::master_nn2%", "yarn.resourcemanager.recovery.enabled": "true", "yarn.resourcemanager.resource-tracker.address": "%HOSTGROUP::master_nn1%:8025", "yarn.resourcemanager.scheduler.address": "%HOSTGROUP::master_nn1%:8030", "yarn.resourcemanager.store.class": "org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore", "yarn.resourcemanager.webapp.address": "%HOSTGROUP::master_nn1%:8088", "yarn.resourcemanager.webapp.address.rm1": "%HOSTGROUP::master_nn1%:8088", "yarn.resourcemanager.webapp.address.rm2": "%HOSTGROUP::master_nn2%:8088", "yarn.resourcemanager.webapp.https.address": "%HOSTGROUP::master_nn1%:8090", "yarn.resourcemanager.webapp.https.address.rm1": "%HOSTGROUP::master_nn1%:8090", "yarn.resourcemanager.webapp.https.address.rm2": "%HOSTGROUP::master_nn2%:8090", "yarn.timeline-service.address": "%HOSTGROUP::master_nn1%:10200", "yarn.timeline-service.webapp.address": "%HOSTGROUP::master_nn1%:8188", "yarn.timeline-service.webapp.https.address": "%HOSTGROUP::master_nn1%:8190" } } } ] } The blueprint file is also attached: cb24-hdp26-ha.txt

jwhitmore · ‎02-22-2018

This is a detailed walk-through of configuring a service account in Google Cloud Platform and a cloud credential in Cloudbreak. Once these are done, Cloudbreak can spin up your clusters in GCP easily and quickly. High Level Steps Enable Compute Engine API in GCP project Create service account with required roles in GCP project Create credential for GCP service account in Cloudbreak You will need a GCP account that you have full rights to administer service accounts and a Cloudbreak instance. The Cloudbreak can be running anywhere, as long as it has network access to GCP. My example is running on an internal OpenStack cluster. Google Cloud Platform Console Log into the GCP Console at https://console.cloud.google.com On the main dashboard page, you will find the Project ID. You will need this to define your credential in Cloudbbreak in a later step. GCP - APIs Dashboard Go to the Service Accounts screen by (1) clicking the menu in the top left, (2) hovering over APIs and Services, and (3) Clicking on Dashboard. GCP - APIs & Services Dashboard Verify that the Google Compute Engine API is listed and enabled. If it is not click on the Enable APIs button to search for and enable it. GCP - Service Accounts Go to the Service Accounts screen by (1) clicking the menu in the top left, (2) hovering over IAM & Admin, and (3) Clicking on Service Accounts. GCP - Create Service Account - Step 1 Click "Create Service Account" GCP - Create Service Account - Step 2 Give the service account a name Check the "Furnish a new key" box. This will download a key to your computer when you finish creating the account. If you are using Cloudbreak 2.7 or later, select JSON format key. Google has deprecated the P12 format and it will eventually be unsupported. If you are using Cloudbreak before 2.7, I strongly recommend that you move to 2.7 because of the many excellent new features and use JSON. In Cloudbreak 2.4, P12 is the only format that is supported. Click the "Select a Role" dropdown Select the required Compute Engine roles. Select the Storage Admin role under Storage. Click outside of the roles selection dropdown to reveal the "create" button. All five of the roles shown are required for the service account. GCP - Create Service Account - Step 3 Click "Create" GCP - Service Account Created The new private key will be downloaded and the password for the key will be displayed. You will not use the password for Cloudbreak. GCP - Service Accounts List You will need to supply the Service Account ID in the Cloudbreak Credential form in a later step. Cloudbreak - Creating GCP credential Log into your Cloudbreak instance. Click on Credentials in navigation bar Click on "Create Credential" Cloudbreak - Select cloud platform Click "Select your cloud provider" to pull down list Click on Google Cloudbreak - Create Credential Give this credential a name. This will be used in Cloudbreak to identify which cloud you will use to provision a new cluster. Paste in the Service Account ID from the earlier step. Paste in the Project ID from the earlier step. Upload the key file that was downloaded in the earlier step. Click "Create" Cloudbreak - Verifying Credential - Step 1 To see that the credential is working, start to create a new cluster Click Clusters on the left-side menu Click Create Cluster Cloudbreak - Verifying Credential - Step 2 Once you select your new credential, the Region and Availability Zone fields should get populated. If they don't, they will be blank or say "select region". That would be an indication that your credential does not have the proper roles, or you do not have the Compute Engine API set up. Finished Once you've verified that your credential can talk to the GCP API, you can finish the cluster creation wizard to build your first cluster.

jwhitmore · ‎05-10-2017

Hi @John Cleveland, can you provide the link to the specific tutorial?

jwhitmore · ‎03-01-2017

@Prasanna G, When you ssh to port 2222, you are inside the sandbox container. Is the file there? if you ls -l tmp/ what do you get? which would be different than ls -l /tmp based on the pscp command you ran. you should be able to pscp directly to the container by going to port 2222. pscp -P 2222 <file_from_local> /tmp Then, in your shell you should be able to [root@sandbox ~]# ls -l /tmp total 548 -rw-r--r-- 1 root root 7540 Feb 27 10:00 file_from_local Then I think you will want to copy it from the linux fs to the HDFS using hadoop command: [root@sandbox ~]# hadoop fs -put /tmp/file_from_local /tmp [root@sandbox ~]# hadoop fs -ls /tmp -rw-r--r-- 1 root hdfs 0 2017-03-01 20:43 /tmp/file_from_local Enjoy! John

Online	Offline
Last Visited	‎08-19-2019 12:32 AM

Member Since	‎01-14-2017 10:45 PM
Last Visited	‎08-19-2019 12:32 AM
Posts	75
Kudos received	35

Cloudera Community

Re: LEARNING THE ROPES OF THE HORTONWORKS SANDBOX ...

Re: Where are all Hadoop nodes in HDP Sandbox?

Re: HDF 2.1: Warnings, frequently, can't track d...

Re: Cannot transfer files to HDP 2.5 sandbox using...

Re: sandbox welcome screen does not appear when co...

Changing Cloudbreak Password

Using Cloudbreak CLI to launch HA Clusters

Adding Hbase High Availability to HA Cloudbreak Bl...

Creating Cloudbreak Security groups in Openstack

Using Pre-defined Security Groups with Cloudbreak

Re: Cloudbreak blueprint for high-availability Had...

Cloudbreak blueprint for high-availability Hadoop ...

Configuring Google Cloud Platform and Cloudbreak C...

Re: Processing Trucking IoT Data with Apache Storm...

Re: Copy file from Windows to sandbox hosted in Az...