Created on 03-02-2017 11:54 AM
Download the HDC cli from the cloud controller
Although it's possible to download the cli from the cloud controller UI we're going to use CURL for this, since we're about to automate the process. All we need is the address of the cloud controller (we make sure with $(uname) that we download the appropriate binary for our OS, note: all of the instance references are examples, there is no real running aws instance):
curl -kL https://ec2-34-251-140-175.eu-west-1.compute.amazonaws.com/hdc-cli_$(uname)_x86_64.tgz | tar -xz
Once it's downloaded we can verify it's version:
./hdc --version hdc version 1.13.0-2017-02-09T08:56:06
Configure the cli to use our cloud controller
The cli can connect to multiple cloud controllers by providing it's address and credentials. Each command has parameters to provide these informations (even can be configured with environment variables):
For convenience we could save them into a file so we wouldn't have to provide these parameters for each command:
./hdc configure --server https://ec2-34-251-140-175.eu-west-1.compute.amazonaws.com --username admin@hortonworks.com --password 'adminPassword123!'
but it's not really a good idea from automation perspective, because in the logs we want to see each command as it is.
Create a cluster using a JSON skeleton
Let's generate our base json skeleton:
./hdc create-cluster generate-cli-skeleton { "ClusterName": "", "HDPVersion": "2.5", "ClusterType": "EDW-ETL: Apache Hive 1.2.1, Apache Spark 1.6", "Master": { "InstanceType": "m4.4xlarge", "VolumeType": "gp2", "VolumeSize": 32, "VolumeCount": 1, "InstanceCount": 1, "Recipes": [] }, "Worker": { "InstanceType": "m3.xlarge", "VolumeType": "ephemeral", "VolumeSize": 40, "VolumeCount": 2, "InstanceCount": 3, "Recipes": [], "RecoveryMode": "AUTO" }, "Compute": { "InstanceType": "m3.xlarge", "VolumeType": "ephemeral", "VolumeSize": 40, "VolumeCount": 1, "InstanceCount": 0, "Recipes": [], "RecoveryMode": "AUTO", "SpotPrice": "0" }, "SSHKeyName": "", "RemoteAccess": "", "WebAccess": true, "HiveJDBCAccess": true, "ClusterComponentAccess": false, "ClusterAndAmbariUser": "", "ClusterAndAmbariPassword": "", "InstanceRole": "CREATE", "Network": { "VpcId": "", "SubnetId": "" }, "Tags": {}, "HiveMetastore": { "Name": "", "Username": "", "Password": "", "URL": "", "DatabaseType": "" }, "Configurations": [] }
As you can see there are default values for certain properties, but it is also missing a few which we need to provide. To manipulate the json we're going to use a really handy tool called JQ. In this tutorial we're not going to change instance types, volumes etc.. as we could, but for now for demonstration purposes let's set the missing properties and write it to a file called cluster.json:
./hdc create-cluster generate-cli-skeleton | jq '.ClusterName = "tutorial-cluster" | .Worker.InstanceCount = 1 | .Compute.SpotPrice = "0.5" | .Compute.InstanceCount = 1 | .Compute.RecoveryMode = "MANUAL" | .SSHKeyName = "my-aws-key" | .RemoteAccess = "0.0.0.0/0" | .ClusterComponentAccess = true | .ClusterAndAmbariUser = "admin" | .ClusterAndAmbariPassword = "admin"' > cluster.json
and the result should look something like this:
cat cluster.json { "ClusterName": "tutorial-cluster", "HDPVersion": "2.5", "ClusterType": "Data Science: Apache Spark 1.6, Apache Zeppelin 0.6.0", "Master": { "InstanceType": "m4.4xlarge", "VolumeType": "gp2", "VolumeSize": 32, "VolumeCount": 1, "InstanceCount": 1, "Recipes": [] }, "Worker": { "InstanceType": "m3.xlarge", "VolumeType": "ephemeral", "VolumeSize": 40, "VolumeCount": 2, "InstanceCount": 1, "Recipes": [], "RecoveryMode": "AUTO" }, "Compute": { "InstanceType": "m3.xlarge", "VolumeType": "ephemeral", "VolumeSize": 40, "VolumeCount": 1, "InstanceCount": 1, "Recipes": [], "RecoveryMode": "MANUAL", "SpotPrice": "0.5" }, "SSHKeyName": "my-aws-key", "RemoteAccess": "0.0.0.0/0", "WebAccess": true, "HiveJDBCAccess": true, "ClusterComponentAccess": true, "ClusterAndAmbariUser": "admin", "ClusterAndAmbariPassword": "admin", "InstanceRole": "CREATE", "Network": { "VpcId": "", "SubnetId": "" }, "Tags": {}, "HiveMetastore": { "Name": "", "Username": "", "Password": "", "URL": "", "DatabaseType": "" }, "Configurations": [] }
We used 0.0.0.0/0 for the RemoteAccess, but in production clusters it is highly discouraged. Let's create this cluster and wait until it finishes. We're going to use the --wait flag so we don't have to write some custom functions to poll the cluster state:
./hdc create-cluster --cli-input-json cluster.json --server https://ec2-34-251-140-175.eu-west-1.compute.amazonaws.com --username admin@hortonworks.com --password 'adminPassword123!' --wait true
Once the command returned we can check the instances:
./hdc describe-cluster instances --cluster-name tutorial-cluster --server https://ec2-34-251-140-175.eu-west-1.compute.amazonaws.com --username admin@hortonworks.com --password 'adminPassword123!' [ { "InstanceId": "i-xxxx", "Hostname": "ip-10-0-1-159.eu-west-1.compute.internal", "PublicIP": "x.x.x.x", "PrivateIP": "10.0.1.159", "InstanceStatus": "REGISTERED", "HostStatus": "HEALTHY", "Type": "master - ambari server" }, { "InstanceId": "i-xxxx", "Hostname": "ip-10-0-1-185.eu-west-1.compute.internal", "PublicIP": "x.x.x.x", "PrivateIP": "10.0.1.185", "InstanceStatus": "REGISTERED", "HostStatus": "HEALTHY", "Type": "worker" }, { "InstanceId": "i-xxxx", "Hostname": "ip-10-0-1-197.eu-west-1.compute.internal", "PublicIP": "x.x.x.x", "PrivateIP": "10.0.1.197", "InstanceStatus": "REGISTERED", "HostStatus": "HEALTHY", "Type": "compute" } ]
Terminate the cluster
Now that we have a cluster we can execute jobs, queries etc.. which in this tutorial we're not going to cover and if we no longer need the cluster we can simply terminate it:
./hdc terminate-cluster --cluster-name tutorial-cluster --server https://ec2-34-251-140-175.eu-west-1.compute.amazonaws.com --username admin@hortonworks.com --password 'adminPassword123!' --wait true