This tutorial will walk you through the process of using Cloudbreak to deploy an HDP 2.6 cluster with Spark 2.1. We'll copy and edit the existing hdp-spark-cluster blueprint which deploys Spark 1.6 to create a new blueprint which installs Spark 2.1. This tutorial is part one of a two-part series. The second tutorial walks you through using Zeppelin to verify the Spark 2.1 installation. You can find that tutorial here: HCC Article
This tutorial was tested in the following environment:
Before we can deploy a Spark 2.1 cluster using Cloudbreak, we need to create a blueprint that specifies Spark 2.1. Cloudbreak ships with 3 blueprints out of the box:
We will use the hdp-spark-cluster as our base blueprint and edit it to deploy Spark 2.1 instead of Spark 1.6.
Click on the manage blueprints section of the UI. Click on the hdp-spark-cluster blueprint. You should see something similar to this:
Click on the blue copy & edit button. You should see something similar to this:
For the Name, enter hdp26-spark21-cluster. This tells us the blueprint is for an HDP 2.6 cluster using Spark 2.1. Enter the same information for the Description. You should see something similar to this:
Now, we need to edit the JSON portion of the blueprint. We need to change the Spark 1.6 components to Spark 2.1 components. We don't need change where they are deployed. The following entries within the JSON are for Spark 1.6:
"name": "SPARK_CLIENT""name": "SPARK_JOBHISTORYSERVER""name": "SPARK_CLIENT"We will replace SPARK with SPARK2. These entries should look as follows:
"name": "SPARK2_CLIENT""name": "SPARK2_JOBHISTORYSERVER""name": "SPARK2_CLIENT"NOTE: There are two entries for SPARK_CLIENT. Make sure you change both.
We are going to add an entry for the LIVY component. We will add it to the same node as the SPARK_JOBHISTORYSERVER. We are also going to add an entry for the SPARK2_THRIFTSERVER component. We will add it to the same node as the SPARK_JOBHISTORYSERVER. Let's add those two entries just below SPARK2_CLIENT in the host_group_master_2 section.
Change the following:
                {
                    "name": "SPARK2_JOBHISTORYSERVER"
                },
                {
                    "name": "SPARK2_CLIENT"
                },
to this:
                {
                    "name": "SPARK2_JOBHISTORYSERVER"
                },
                {
                    "name": "SPARK2_CLIENT"
                },
                {
                    "name": "SPARK2_THRIFTSERVER"
                },
                {
                    "name": "LIVY2_SERVER"
                },
We also need to update the blueprint_name to hdp26-spark21-cluster and the stack_version to 2.6. you should have something similar to this:
    "Blueprints": {
        "blueprint_name": "hdp26-spark21-cluster",
        "stack_name": "HDP",
        "stack_version": "2.6"
    }
If you prefer, you can copy and paste the following blueprint JSON:
{
    "host_groups": [
        {
            "name": "host_group_client_1",
            "configurations": [],
            "components": [
                {
                    "name": "ZOOKEEPER_CLIENT"
                },
                {
                    "name": "PIG"
                },
                {
                    "name": "OOZIE_CLIENT"
                },
                {
                    "name": "HBASE_CLIENT"
                },
                {
                    "name": "HCAT"
                },
                {
                    "name": "KNOX_GATEWAY"
                },
                {
                    "name": "METRICS_MONITOR"
                },
                {
                    "name": "FALCON_CLIENT"
                },
                {
                    "name": "TEZ_CLIENT"
                },
                {
                    "name": "SPARK2_CLIENT"
                },
                {
                    "name": "SLIDER"
                },
                {
                    "name": "SQOOP"
                },
                {
                    "name": "HDFS_CLIENT"
                },
                {
                    "name": "HIVE_CLIENT"
                },
                {
                    "name": "YARN_CLIENT"
                },
                {
                    "name": "METRICS_COLLECTOR"
                },
                {
                    "name": "MAPREDUCE2_CLIENT"
                }
            ],
            "cardinality": "1"
        },
        {
            "name": "host_group_master_3",
            "configurations": [],
            "components": [
                {
                    "name": "ZOOKEEPER_SERVER"
                },
                {
                    "name": "APP_TIMELINE_SERVER"
                },
                {
                    "name": "TEZ_CLIENT"
                },
                {
                    "name": "HBASE_MASTER"
                },
                {
                    "name": "HBASE_CLIENT"
                },
                {
                    "name": "HDFS_CLIENT"
                },
                {
                    "name": "METRICS_MONITOR"
                },
                {
                    "name": "SECONDARY_NAMENODE"
                }
            ],
            "cardinality": "1"
        },
        {
            "name": "host_group_slave_1",
            "configurations": [],
            "components": [
                {
                    "name": "HBASE_REGIONSERVER"
                },
                {
                    "name": "NODEMANAGER"
                },
                {
                    "name": "METRICS_MONITOR"
                },
                {
                    "name": "DATANODE"
                }
            ],
            "cardinality": "6"
        },
        {
            "name": "host_group_master_2",
            "configurations": [],
            "components": [
                {
                    "name": "ZOOKEEPER_SERVER"
                },
                {
                    "name": "ZOOKEEPER_CLIENT"
                },
                {
                    "name": "PIG"
                },
                {
                    "name": "MYSQL_SERVER"
                },
                {
                    "name": "HIVE_SERVER"
                },
                {
                    "name": "METRICS_MONITOR"
                },
                {
                    "name": "SPARK2_JOBHISTORYSERVER"
                },
                {
                    "name": "SPARK2_CLIENT"
                },
                {
                    "name": "SPARK2_THRIFTSERVER"
                },
                {
                    "name": "LIVY2_SERVER"
                },
                {
                    "name": "TEZ_CLIENT"
                },
                {
                    "name": "HBASE_CLIENT"
                },
                {
                    "name": "HIVE_METASTORE"
                },
                {
                    "name": "ZEPPELIN_MASTER"
                },
                {
                    "name": "HDFS_CLIENT"
                },
                {
                    "name": "YARN_CLIENT"
                },
                {
                    "name": "MAPREDUCE2_CLIENT"
                },
                {
                    "name": "RESOURCEMANAGER"
                },
                {
                    "name": "WEBHCAT_SERVER"
                }
            ],
            "cardinality": "1"
        },
        {
            "name": "host_group_master_1",
            "configurations": [],
            "components": [
                {
                    "name": "ZOOKEEPER_SERVER"
                },
                {
                    "name": "HISTORYSERVER"
                },
                {
                    "name": "OOZIE_CLIENT"
                },
                {
                    "name": "NAMENODE"
                },
                {
                    "name": "OOZIE_SERVER"
                },
                {
                    "name": "HDFS_CLIENT"
                },
                {
                    "name": "YARN_CLIENT"
                },
                {
                    "name": "FALCON_SERVER"
                },
                {
                    "name": "METRICS_MONITOR"
                },
                {
                    "name": "MAPREDUCE2_CLIENT"
                }
            ],
            "cardinality": "1"
        }
    ],
    "Blueprints": {
        "blueprint_name": "hdp26-spark21-cluster",
        "stack_name": "HDP",
        "stack_version": "2.6"
    }
}
Once you have all of the changes in place, click the green create blueprint button.
We need to create a new security group to use with our cluster. By default, the existing security groups only allow ports 22, 443, and 9443. As part of this tutorial, we will use Zeppelin to test Spark 2.1. We'll create a new security group that opens all ports to our IP address.
Click on the manage security groups section of the UI. You should see something similar to this:
Click on the green create security group button. You should see something similar to this:
First you need to select the appropriate cloud platform. I'm using AWS, so that is what I selected. We need to provide a unique name for our security group. I used all-ports-my-ip. You should use something descriptive. Provide a helpful description as well. Now we need to enter our personal IP address CIDR. I am using #.#.#.#/32; your IP address will obviously be different. You need to enter the port range. There is a known issue in Cloudbreak that prevents you from using 0-65356, so we'll use 1-65356. For the protocol, use tcp. Once you have everything entered, you should see something similar to this:
Click the green Add Rule button to add this rule to our security group. You can add multiple rules, but we have everything covered with our single rule. You should see something similar to this:
If everything looks good, click the green create security group button. This will create our new security group. You should see something like this:
Now that our blueprint has been created and we have an new security group, we can begin building the cluster. Ensure you have selected the appropriate credential for your cloud environment. Then click the green create cluster button. You should see something similar to this:
Give your cluster a descriptive name. I used spark21test, but you can use whatever you like. Select an appropriate cloud region. I'm using AWS and selected US East (N. Virginia), but you can use whatever you like. You should see something similar to this:
Click on the Setup Network and Security button. You should see something similar to this:
We are going to keep the default options here. Click on the Choose Blueprint button. You should see something similar to this:
Expand the blueprint dropdown menu. You should see the blueprint we created before, hdp26-spark21-cluster. Select the blueprint. You should see something similar to this:
You should notice the new security group is already selected. Cloudbreak did not automatically figure this out. The instance templates and security groups are selected alphabetically be default.
Now we need to select a node on which to deploy Ambari. I typically deploy Ambari on the master1 server. Check the Ambari check box on one of the master servers. If everything looks good, click on the green create cluster, You should see something similar to this:
Once the cluster has finished building, you can click on the arrow for the cluster we created to get expanded details. You should see something similar to this:
Once the cluster is fully deployed, we can verify the versions of the components. Click on the Ambari link on the cluster details page. Once you login to Ambari, you should see something similar to this:
You should notice that Spark2 is shown in the component list. Click on Spark2 in the list. You should see something similar to this:
You should notice that both the Spark2 Thrift Server and the Livy2 Server have been installed. Now lets check the overall cluster verions. Click on the Admin link in the Ambari menu and select Stacks and Versions. Then click on the Versions tab. You should see something similar to this:
As you can see, HDP 2.6.0.3 was deployed.
If you have successfully followed along with this tutorial, you should know how to create a new security group and blueprint. The blueprint allows you to deploy HDP 2.6 with Spark 2.1. The security group allows you to access all ports on the cluster from your IP address. Follow along in part 2 of the tutorial series to use Zeppelin to test Spark 2.1.
