Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Cloudbreak stuck in a loop after attempting to repair a cluster

Highlighted

Cloudbreak stuck in a loop after attempting to repair a cluster

New Contributor

We had attempted to repair a cluster after one of our nodes went into a bad state due to an issue with AWS. I ran the following command:

cb cluster repair --name <cluster> --host-groups <host_groups>

What I'm seeing now is that cloudbreak seems to be stick in a loop where we're seeing the following:

cloudbreak_1   | 2019-05-21 13:30:21,119 [reactorDispatcher-68] pollWithTimeout:32 INFO  c.s.c.s.PollingService - [owner:6476a4d7-bab8-4bf9-bfcd-aa6a43aa1d5f] [type:CLUSTER] [id:2] [name:emea-hdp] [flow:438c526a-325b-40c8-b86a-cc15aad4728a] [tracking:669a1784-a361-4509-9cd2-c57847a15cbb] Polling attempt 16277.
cloudbreak_1   | 2019-05-21 13:30:21,134 [reactorDispatcher-68] checkStatus:48 INFO  c.s.c.s.c.f.AmbariOperationsStatusCheckerTask - [owner:6476a4d7-bab8-4bf9-bfcd-aa6a43aa1d5f] [type:CLUSTER] [id:2] [name:<cluster>] [flow:438c526a-325b-40c8-b86a-cc15aad4728a] [tracking:669a1784-a361-4509-9cd2-c57847a15cbb] Ambari operation: 'Stopping components on the decommissioned hosts', Progress: 0
uluwatu_1      | 2019-05-21T13:30:21.141Z INFO [owner: ] [email: ] /notification endpoint:  {"eventType":"STOP_SERVICES_AMBARI_PROGRESS_STATE","eventTimestamp":1558445421137,"eventMessage":"0","owner":null,"account":null,"userIdV3":"email@email.com","cloud":"AWS","region":"eu-central-1","availabilityZone":null,"blueprintId":null,"blueprintName":null,"clusterId":2,"clusterName":"<cluster>","stackId":2,"stackName":"<cluster>","stackStatus":"AVAILABLE","nodeCount":null,"instanceGroup":null,"clusterStatus":"UPDATE_IN_PROGRESS","workspaceId":1}

cb cluster list shows the following:

[
  {
    "Name": "<cluster>",
    "Description": "",
    "CloudPlatform": "AWS",
    "StackStatus": "AVAILABLE",
    "ClusterStatus": "UPDATE_IN_PROGRESS"
  }
]

At this point we'd just like to stop the action and go back to a normal state. Any advice would be great.

4 REPLIES 4

Re: Cloudbreak stuck in a loop after attempting to repair a cluster

Expert Contributor

Hi @Oliver Fox,

It looks like removing the node from Ambari is stuck. Could you check the Ambari UI/logs to see if it has any issues?

Re: Cloudbreak stuck in a loop after attempting to repair a cluster

New Contributor

The only thing that I've found that looks like an error is in the ambari-audit.log, nothing in the Ambari UI:

2019-05-22T13:14:03.946Z, User(null), RemoteIp(<IP>), Operation(User login), Roles(
), Status(Failed), Reason(Authentication required)
2019-05-22T13:14:03.947Z, User(cloudbreak), RemoteIp(<IP>), Operation(User login), Roles(
    Ambari: Ambari Administrator
), Status(Success)

The EC2 node that was causing problems is actually in a good state now, it doesn't need to be removed.

Re: Cloudbreak stuck in a loop after attempting to repair a cluster

Expert Contributor

if you click on the background operation icon (gear on the right upper corner) in Ambari, do you see a job called "Stop all components on host"?
108921-1558600986041.png

This is what CB is waiting for according to the logs.

Also you can try restart CB, it may get out if the loop.

Re: Cloudbreak stuck in a loop after attempting to repair a cluster

New Contributor

There are completed background operations, none for "Stop All Components on hosts" and no pending operations.

We ended up restarting CB as suggested, it did complete and removed the node and added a new one to the cluster.

Thanks for the help.