Support Questions
Find answers, ask questions, and share your expertise

cloudbreak cluster doesn't sync

cloudbreak cluster doesn't sync

Explorer

I was trying to add more nodes to the existing cluster and but the operation fails with message "The state of one or more instances couldn't be determined. Try syncing later."

Logs show:

```payload: GetInstancesStateResult{cloudContext=CloudContext{id=242, name='prod-dataeng-mr-cluster', platform='StringType{value='AWS'}', owner='9a3eacb8-7ed3-4388-ad6e-f1aef8f2d5fd'}, statuses=[], exception=com.amazonaws.services.ec2.model.AmazonEC2Exception: The instance ID 'i-09475fd6719' does not exist (Service: AmazonEC2; Status Code: 400; Error Code: InvalidInstanceID.NotFound; Request ID: f4a058b0-37b2-4047-83ae-b74ab79b2dca)} /cbreak_cloudbreak_1 | 2018-02-02 01:03:35,634 [reactorDispatcher-63] execute:73 INFO c.s.c.c.f.AbstractAction - [owner:9a3eacb8-7ed3-4388-ad6e-f1aef8f2d5fd] [type:STACK] [id:242] [name:prod-dataeng-mr-cluster] Stack: 242, flow state: SYNC_STATE, phase: service, execution time 5 sec /cbreak_cloudbreak_1 | 2018-02-02 01:03:35,634 [reactorDispatcher-63] doExecute:102 ERROR c.s.c.c.f.s.s.StackSyncActions - [owner:9a3eacb8-7ed3-4388-ad6e-f1aef8f2d5fd] [type:STACK] [id:242] [name:prod-dataeng-mr-cluster] Error during Stack synchronization flow: /cbreak_cloudbreak_1 | com.amazonaws.services.ec2.model.AmazonEC2Exception: The instance ID 'i-09475fd6719' does not exist (Service: AmazonEC2; Status Code: 400; Error Code: InvalidInstanceID.NotFound; Request ID: f4a058b0-37b2-4047-83ae-b74ab79b2dca) /cbreak_cloudbreak_1 | at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1588) /cbreak_cloudbreak_1 | at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1258) /cbreak_cloudbreak_1 | at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1030)```

But when this instance got deleted we repaired the cluster to make sure metadata is clean

```1/3/2018 2:09:44 PM prod-dataeng-mr-cluster - update in progress: Stack repair triggered.1/3/2018 2:09:58 PM prod-dataeng-mr-cluster - available: Deleted host ip-10-*-*-*.us-west-2.compute.internal from Ambari as it is marked as terminated by the cloud provider.1/3/2018 2:09:58 PM prod-dataeng-mr-cluster - available: Deleted instance i-09475fd6719 (ip-10-*-*-*.us-west-2.compute.internal) from Cloudbreak metadata because it couldn't be found on the cloud provider.1/3/2018 2:09:58 PM prod-dataeng-mr-cluster - available: Synced instance states with the cloud provider.1/3/2018 2:10:13 PM prod-dataeng-mr-cluster - available: The cluster state synchronized with Ambari: Services are installed and running.1/3/2018 2:10:15 PM prod-dataeng-mr-cluster - available: Synced instance states with the cloud provider.```

Now I am not able to add more nodes as it tries to access the deleted instance.Any idea how to get around this?

Thanks in Advance

6 REPLIES 6

Re: cloudbreak cluster doesn't sync

@prarthana basgod

Could you add the following info:

  • which Cloudbreak version are you using?
  • are you trying to scale up manually or with autoscale?

Re: cloudbreak cluster doesn't sync

Explorer

@pdarvasi we are using version: 1.16.4, and we are trying to add more nodes using cloudbreak ui

Re: cloudbreak cluster doesn't sync

Expert Contributor

@prarthana basgod

I could not reproduce your case. Could you describe it in more details as:

  • which blueprint do you use?
  • how many worker, compute node did you started your cluster with?
  • how many of these was terminated?

Re: cloudbreak cluster doesn't sync

Explorer

@mmolnar

i have attached the blurprint : bp.txt

event log: event.txt

We started 30 nodes 3 master, 1 client, 1 task and 25 worker. lost 2 worker nodes in between. we added two nodes using cloudbreak ui. now i cannot add more nodes as its looking for state of a node that is already terminated.

Thanks in advance

Re: cloudbreak cluster doesn't sync

Expert Contributor

@prarthana basgod

thanks for the details, it seems to me that the problem is that after trying to scale up, the new host went to failed state and cloudbreak could not proceed further.

You should find the failed node between the nodes and use the Terminate button to remove it and try scaling up after.
If it fails please attach the cbd logs:

http://hortonworks.github.io/cloudbreak-docs/latest/operations/#accessing-logs

filtered for cloudbreak:

cbd logs cloudbreak

Re: cloudbreak cluster doesn't sync

Explorer

@mmolnar,

Apologies for the delay.

I don't see the node that got deleted in listed nodes in cloudbreak UI. Any other pointers?