I'm testing deployments of CDH in Azure and ran into an issue terminating a failed deployment. I discovered that there was a syntax error in the bootstrap file where the publicIP parametered was defined in one of the instance templates. Intrestingly enough, the bootstrap file passed validation prior to the deployment attempt.
The deployment failed with:
java.lang.IllegalArgumentException: Configuration property not found: Public IP
... however there are remnants of the deployment left in Cloudera Director and any attempt to terminate the cluster (command line and via web) results in the error indicated above.
How can I clear out the cluster meta-data so that I can attempt to deploy again? Otherwise, subsequent deployment attempts results in an error due to deployment already existing.
Update: The director client log throws the following error when attempting to delete the failed deployment. The error makes sense considering the deployment did fail, but how do a "force delete" the failed deployment?
"localizedMessage" : "Deployment is not available for /Evil/Evil", "message" : "Deployment is not available for /Evil/Evil", "suppressed" : [ ] } at com.cloudera.director.client.common.ApiClient.invokeAPI(ApiClient.java:271) at com.cloudera.director.client.latest.api.DeploymentsApi.collectDiagnosticData(DeploymentsApi.java:83) at com.cloudera.launchpad.commands.TerminateRemoteCommand.collectDiagnosticData(TerminateRemoteCommand.java:284) at com.cloudera.launchpad.commands.TerminateRemoteCommand.run(TerminateRemoteCommand.java:116) at com.cloudera.launchpad.commands.RemoteCommand.run(RemoteCommand.java:204) at com.cloudera.launchpad.Application.run(Application.java:141) at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:801) at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:785) at org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:772) at org.springframework.boot.SpringApplication.run(SpringApplication.java:317) at com.cloudera.launchpad.Application.start(Application.java:98) at com.cloudera.launchpad.Application.main(Application.java:48) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48) at org.springframework.boot.loader.Launcher.launch(Launcher.java:87) at org.springframework.boot.loader.Launcher.launch(Launcher.java:50) at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51)
You mention that the deployment failed, and that you are trying to terminate the cluster.
Have you tried terminating the deployment directly in the Cloudera Director UI?
Hello. Thank you for your response.
I've tried terminating the cluster via command line and via the Cloudera Director UI. As one would expect, the resulting errors in the logs are the same regardless of the method I use. I've since had to manually terminate the resources in Azure to avoid the excess charges, however I still have artifacts of the failed deployments / termination attempts left in Director.
There is a technical distinction in Cloudera Director between a deployment and a cluster. Each Cloudera Director instance can manage multiple environments (each corresponding to an account on a cloud provider), each of which can have multiple deployments (each corresponding to a single Cloudera Manager instance), each of which can have multiple clusters.
I am specifically asking if you have tried terminating the deployment, rather than the cluster. In your last answer you still mentioned terminating the cluster.
Let me elaborate. Typically between tests in AWS, I'll run cloudera-director terminate-remote bootstrap_filename, which terminates the cluster, manager, database. Sometimes the environment definition still exists in the Director UI, and I'll delete it as well before running another test. Each test creates a new environment, manger, DB, cluster.
I'm trying to do the same for my Azure testing, but termination failed during cluster deletion after 2 failed deployment attempts within the same environment. After discovering why the clusters were failing to deploy, I fixed my bootstrap script and deployed a new cluster and manager within the existing environment containing the 2 failed cluster deployments. I would like to delete those 2 failed clusters along with their manager. Granted, they don't actual exist in Azure - but Director still shows them in a "terminate failed" state.
In director UI, I do not see an option to "terminate deployment". I see options to terminate clusters, cloudera manager, database and delete environments.
Your prior information was clear. The mismatch was in my terminology. The term "deployment" is used in the APIs, while the UI says "Cloudera Manager". These are synonymous. Sorry for the confusion!
Please see if you can terminate one of the failed Cloudera Managers, rather than one of the corresponding clusters.
If that fails, I'm sorry to say that there is currently no safe way to force deletion of the deployment or cluster from Director's database. If this is purely a test setup, you could delete any successful deployments, stop the Director service, then start from scratch (on H2, by deleting the director database file from disk, or on MySQL, by creating a new empty database for Director with the appropriate permissions). When you start Director again, it will rebuild the database contents.
If this is not feasible, you can just ignore the failed deployments and clusters.
Unfortunately, attempts to terminate clouder manager (deployment) from Cloudera Director UI fail as well. The same Director instance is used to deploy production clusters, so wiping the DB isn't an option. We are currently deploying with no license, hence enterprise support is not an option.
It's worth mentioning that we've yet to run into this with our AWS deployments using Cloudera Director. Any failed manager/cluster deployments have alway terminated successfully, regardless of which stage the failure occurred. Thus far, I've only had this problem with Azure deployments using Director.
Thank you for your suggestions.