Support Questions

Find answers, ask questions, and share your expertise

Best way to re-add a server to HDP cluster?

avatar
Super Collaborator

One of the servers in our cluster failed due to multiple disk drive failures.

The server was not a data node or master server - it was used as a journal node for HA, a zookkeeper server, HST server, Apache Thrift server, and Grafana server.

Using Ambari, we put the server in maintenance mode, and then rebuilt it.

It is now ready to come back online. We have ambari-agent installed, as well as necessary repos, etc.

My question is: can I use the Ambari GUI and "delete" the server from the cluster, and then follow the steps to add the server back?

Should I expect any issues, since it has the same name, IP address, etc.?

Is there a better way to accomplish this?

We basically want to bring it online and put it back to work doing everything it was doing before.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

If the server was erased an rebuilt with same IP and Hostname, you may have to do these steps -

1) Stop ambari agent form the rebuilt host.

2) Use Ambari rest API to delete each services that ambari thinks this host has/had

3) Delete the host from Ambari

4) Restart ambari server

5) Add as a new host.

Regards

Pranay Vyas

View solution in original post

3 REPLIES 3

avatar
Super Collaborator

If the server was erased an rebuilt with same IP and Hostname, you may have to do these steps -

1) Stop ambari agent form the rebuilt host.

2) Use Ambari rest API to delete each services that ambari thinks this host has/had

3) Delete the host from Ambari

4) Restart ambari server

5) Add as a new host.

Regards

Pranay Vyas

avatar
Super Collaborator

Thanks Pranay,

We followed these steps - though I just realized that we forgot to restart ambari-server.

We were able to successfully delete the server.

However, when we added the server, it now seems to be stuck on a modal dialog that says "Please wait while the hosts are being checked for potential problems..."

In the ambari-agent logs on the server (being added), we see this repeated over and over again:

INFO 2017-01-18 14:22:00,773 Controller.py:265 - Heartbeat response received (id = 109)
INFO 2017-01-18 14:22:00,773 RecoveryManager.py:260 - METRICS_COLLECTOR needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2017-01-18 14:22:00,773 RecoveryManager.py:795 - Recovery is paused, likely tasks waiting in pipeline for this host.
INFO 2017-01-18 14:22:10,674 Heartbeat.py:78 - Building Heartbeat: {responseId = 109, timestamp = 1484767330674, commandsInProgress = False, componentsMapped = False}
INFO 2017-01-18 14:22:10,716 Controller.py:265 - Heartbeat response received (id = 110)
INFO 2017-01-18 14:22:10,716 RecoveryManager.py:260 - METRICS_COLLECTOR needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2017-01-18 14:22:10,716 RecoveryManager.py:795 - Recovery is paused, likely tasks waiting in pipeline for this host.
INFO 2017-01-18 14:22:20,617 Heartbeat.py:78 - Building Heartbeat: {responseId = 110, timestamp = 1484767340617, commandsInProgress = False, componentsMapped = False}
INFO 2017-01-18 14:22:20,660 Controller.py:265 - Heartbeat response received (id = 111)

Do you have any insights as to what may be the solution?

avatar
Super Collaborator

We were ultimately able to get everything back in shape, but it wasn't pretty.

Too many steps to detail here.