I am working at Hadapt, Cloudera patrner company, on a product that builds on top of Cloudera CDH. For testing we would like to be able to create in automation clusters with different kinds of full Cloudera deployments, as created and configured by Cloudera Manager. For example, we would like to be able to deploy CDH 4.3 and 4.4 using RPM and Parcels into different hardware environments in a fully automated way.
We believe most users will be using the Express Wizard to install clusters. Is there any way to simply automate the equivalent of an Express Wizard install via CM without needing to interact with the GUI?
For example, I am able to install Cloudera Manager the way a customer would using:
cloudera-manager-installer.bin --i-agree-to-all-licenses --noprompt --noreadme --nooptions
You're on the right track.
The general flow of installation should go:
1) Install cm binaries on server host (may also need to install dependencies like java or database)
2) Install cm agents on cluster hosts (as you pointed out, this is only possible via CM API in CM5)
3) Replicate every configuration and command performed by the CM wizard through API. Everything config and command involved is exposed in API (our internal testing automation uses this). You can even go further and add things like enabling NN HA or JT HA.
Note that it is not possible to get CM recommendations via the API. You will need to determine all configuration manually. It may help to run the wizard through the UI, then capture what CM configures and repeat it in your API scripts. The "deployment" endpoint in the API can be very useful for this.
Also don't forget to distribute install JDBC drivers on all cluster hosts if using mysql or postgres.
Thanks for the quick reply.
Manually re-creating the behavior seemed somewhat error-prone because the "automated" version that the ExpressWizard does may change without us noticing it. (We have even considered using Selenium or something to drive the wizard remotely, though that seems frought with its own head-aches.)
I had not noticed the deployment end point, thanks for the pointer. It looks like the deployment end point is not available in the Python API. From skimming the Java code, it looks like it is functionally similar to this dump script that I threw together last week. https://gist.github.com/sit/7208850.
Do you have some script that automatically re-stores via the API all the settings found in a deployment?
You can put to the cm deployment API and it will create a cluster as specified in the json. See http://cloudera.github.io/cm_api/apidocs/v6/path__cm_deployment.html (available since v1 or v2 of the API, I forget)
Looks like deployment is missing from the python bindings, as you pointed out. You'll have to manually access the URL, or enhance the python bindings = ). It's in the Java bindings.
Using deployment won't run the commands for you, but is one way of quickly creating a cluster with the desired role assignments and configuration.
While it's true that steps change a little over time, generally CM will add steps and ocnfigs and not remove them, and your old workflows will work as well as they used to because we try to maintain API compatibility. This means that you can usually expect your scripts to keep working and only update them if you want to take advantage of a new feature, even when CM version changes. One notable exception to this will be around Impala, however, which will get a new mandatory role soon, even for CDH4. Keep an eye out for that when CM 4.8 comes out. Also we'll make some minor incompatible changes like deleting deprecated / refactored / unused configs in CM 5. Other partners are using the CM API effectively to automate deployments.
Initializing zookeeper and formatting hdfs are available via the API as commands, just like start, create HDFS /tmp dir, create hive metastore tables, etc.
Deployment only handles configuration, not binary distribution, so it won't handle parcels for you. There are also API commands for that (I recently added parcel examples to the CM API docs at http://cloudera.github.io/cm_api/docs/python-client/).
Unfortunately, I'm not aware of any sample code out there.