07-24-2018 11:00 AM
In this example, Cloudera Director fails to deploy all master instances in Azure. I've seen this happen occassionally, with subsequent redeploys completing successfully. However, this time I cannot terminate the cluster because Cloudera Director appears to be confused about the fact that the master instances do not exist - at least, that's my guess. Also, about 1 minute into the termination attempt I see the following error:
ERROR [p-44256946bec7-DefaultTerminateClusterJob] 3e44d1dd-563d-4c30-ac83-f273e5277e44 DELETE /api/v11/environments/Evil/deployments/Evil-1/clusters/EvilCorp-1 com.cloudera.launchpad.cleanup.WaitForInstancesTermination - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed java.util.concurrent.TimeoutException: Not all instances terminated in 20 MINUTES as expected
Note the "20 minutes" - again, this is one minute into the termination attempt.
We need a force delete option which will allow director to continue to delete all assets that it can, skipping those that it cannot. I realize this could be dangerous - hence the "force" nomenclature. As it is, I'm ending up with a stack a failed termination attempts of failed deployments.
07-26-2018 11:47 AM
Are there currently orphaned resources (such as instances) in Azure? If so, can you try deleting them manually and then trying to terminate the deployment via the Cloudera Director UI?
07-30-2018 12:35 PM
You're thinking along the same lines I was. That was the first thing I tried after clearning out resources in Azure, but to no avail. Termination still fails. This operation was much more robust on the AWS side whenever deployments failed.