Created on 04-23-2018 02:26 PM - edited 09-16-2022 06:08 AM
Hi ,
I have a running cluster with 3 data node and 1 master node in Azure
I tried to add the first gateway node through cloudera director web ui .
->I have seen node get created successfully created in azure
->Started and attached in Cloudera manager
->Streamset , Apark and cDH parcel also get distributed
->cloudera manager agent installed
In the last step is get failed with error message "" also director rolback the gateway node and deleted the node from azure .
As per the Log
->Edge Node get created and also attached to the Cloudera manager
->Streamsets CDH and Spark 2 parcel also get distributed and activated
[2018-04-19 20:29:46.212 +0000] INFO [p-619efeab757f-DefaultUpdateClusterJob] a230bff5-b6be-4e08-b652-f72041edb03b PUT /api/v11/environments/HDMI_TEST/deployments/SANDBOX/clusters/sandboxcluster com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Parcel (STREAMSETS_DATACOLLECTOR, 3.1.0.0) stage is ACTIVATED as expected and stable
2018-04-19 20:29:54.462 +0000] INFO [p-619efeab757f-DefaultUpdateClusterJob] a230bff5-b6be-4e08-b652-f72041edb03b PUT /api/v11/environments/HDMI_TEST/deployments/SANDBOX/clusters/sandboxcluster com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Parcel (SPARK2, 2.2.0.cloudera2-1.cdh5.12.0.p0.232957) stage is ACTIVATED as expected and stable
[2018-04-19 20:29:50.337 +0000] INFO [p-619efeab757f-DefaultUpdateClusterJob] a230bff5-b6be-4e08-b652-f72041edb03b PUT /api/v11/environments/HDMI_TEST/deployments/SANDBOX/clusters/sandboxcluster com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Parcel (CDH, 5.14.0-1.cdh5.14.0.p0.24) stage is ACTIVATED as expected and stable
After that it fails with error .
[2018-04-19 20:29:55.812 +0000] WARN [p-633ca3a297f0-ApplyHostTemplatesPerHostJob] a230bff5-b6be-4e08-b652-f72041edb03b PUT /api/v11/environments/HDMI_TEST/deployments/SANDBOX/clusters/sandboxcluster com.cloudera.launchpad.bootstrap.cluster.hostTemplate.CreateAndApplyHostTemplateJob$ApplyHostTemplatesJob - c.c.l.b.c.h.CreateAndApplyHostTemplateJob: Bad request exception when applying host template
com.cloudera.api.ext.ClouderaManagerException: API call to Cloudera Manager failed. Method=HostTemplatesResource.applyHostTemplate. Response Status Code: 400. Message: {
"message" : "Host must have a single version of CDH installed."
}. - Cause: javax.ws.rs.BadRequestException HTTP 400 Bad Request
I am not able to again spin the edge node as director doesn’t give me an option again to add Nodes as cluster is n “update request failed “ in director .
However existing cluster is running fine and the new edge node get automatically deleted by director .
Now at this stage i am not able to add the gateay node again in the cluster thorugh director as the status of the cluster in director is update failed and i am not getting modify cluster status to add the gateway node
Created 04-24-2018 08:44 AM
Sameer, could you post the whole of your Director log? At the minimum, Director shouldn't be entering into UPDATE_FAILED because of this. I'd like to know exactly what went wrong.
It looks like the message "Host must have a single version of CDH installed." appears when CM can't determine the CDH version on the new host. What's the full version of CDH that's currently running on this cluster, including the patch version? e.g. 5.14.0.24. This should show up in the Director UI, but you should be able to find this in the parcel section on the CM instance itself.
Is this a production cluster? I see the name is "sandbox." Is it possible to create this cluster again?
Created 04-24-2018 01:09 PM
Created 04-25-2018 01:17 PM
I unfortunately do not see the application log -- is the link missing?
With the cluster in UPDATE_FAILED, Director is no longer able to operate on this cluster. Do you have an earlier DB backup that you would be able to restore to?
Getting out of UPDATE_FAILED is possible, but is currently only supported as a guided process through Cloudera Director Support due to how dangerous it can be. Do you have a support contract with Cloudera at the moment?
While you can add this through CM, Director will be unable to manage this host and it may make growing/shrinking difficult once the UPDATE_FAILED problem is fixed. If you add it through CM and then recover Director, Director will be able to grow and shrink the cluster, but will be otherwise unaware of that new node, and will not be able to modify it if desired.
The extra CDH parcel could be a source of grief, but I'd like to try to get by the UPDATE_FAILED problem first.