Created on 03-31-2014 01:10 PM - edited 09-16-2022 01:56 AM
64-node cluster
one node is bad..no longer communicates
I want to remove him from the cluster
ch-8 | 10.71.0.108 | /default | CDH 5 | Cluster 1 |
2 Role(s) |
Good Health | 12.38s ago |
ch-9 | 10.71.0.109 | /default | Unknown | Cluster 1 |
2 Role(s) |
Bad Health | None |
here he is in the hosts. he's a data node and a nodemanager (yarn)
When I try to delete, it tells me
Delete Hosts The following 1 host(s) cannot be deleted because they have role instances or are not completely decommissioned:
ch-9 | nodemanager (ch-9) and 1 other role(s). |
If I try this, it doesn't work Remove Hosts From Cluster
Removing these hosts will stop and delete all roles running on them and then remove them from their clusters. The hosts will still be managed by Cloudera Manager and can be utilized after being added to new or existing clusters.
Role data directories will not be deleted.
Host Role
ch-9 | NodeManager, DataNode |
Hosts Decommission | Finished | Mar 31, 2014 1:09:28 PM PDT | Mar 31, 2014 1:09:28 PM PDT |
Command 'DecommissionWithWait' failed for service 'yarn' |
Decommission (2) | YARN (MR2 Included) | Finished | Mar 31, 2014 1:09:28 PM PDT | Mar 31, 2014 1:09:28 PM PDT |
Failed to perform decommission. |
Basically, if I can't talk to the node, I can't stop/decommission/delete him How should I do it?
Created 03-31-2014 02:14 PM
Created 03-31-2014 02:01 PM
Created 03-31-2014 02:11 PM
thanks for the quick response.
I thought I had tried that...but
I was showing someone else the problem and told him to just try it and he managed to delete the node just doing what I had done that failed. So maybe the node got in the desired state.
in any case, I can't try your recommendation immediately.
However I think I'll get nodes into a bad state again sometime soon and will try what you recommend
thanks again
(sorry I can't confirm exactly right now, but my node is gone now which is good.)
-kevin
Created 03-31-2014 02:14 PM
Created on 01-22-2019 03:33 AM - edited 01-22-2019 03:36 AM
Remove the dead host/decommissioned host from mammoth -c output or CM.
We have already deleted the host.
As we are about to start the upgrade process from 5.14.2 to 6.0. So as a prerequisites,
When running ./mammoth -c it is giving information about the hosts which is not part of cluster. We are also thinking to remove it from scm database table hosts On mysql, under scm database, also I am able to see :
mysql> mysql> select * from HOSTS; +---------+-------------------------+--------------------------------------+-----------------------------+---------------+----------+--------+- | HOST_ID | OPTIMISTIC_LOCK_VERSION | HOST_IDENTIFIER | NAME | IP_ADDRESS | RACK_ID | STATUS | +---------+-------------------------+--------------------------------------+-----------------------------+---------------+----------+--------+- | 1 | 248 | 260772a1-a89a-42b8-af4c-0406ac0c21bd | bdk1n07.bnet.luxds.net | 192.168.11.16 | /default | NA | | 2 | 251 | 19103582-a94d-4961-aeb8-5a2023480fa5 | bdk1n09.bnet.luxds.net | 192.168.11.18 | /default | NA | | 3 | 254 | e57f3aa9-ab4f-4b3c-925d-2be272237928 | bdk1n08.bnet.luxds.net | 192.168.11.17 | /default | NA | | 4 | 89 | 0317c86d-b693-4280-ba25-0bbcc46e567c | xl11lsrv0428.bnet.luxds.net | 10.178.65.98 | /default | NA | +---------+-------------------------+--------------------------------------+-----------------------------+---------------+----------+--------+-
One with hostId "0317c86d-b693-4280-ba25-0bbcc46e567c"(which was edge node before) is removed from cloudera, so is there anyway to clean this node from CM, because on the screen of cloudera - hosts I am just able to see 3 nodes.
Is that server xl11lsrv0428.bnet.luxds.net | 10.178.65.98 still running separately?
it is running seperately and even re-imaged.
Is CM agent still running /stopped on the server xl11lsrv0428.bnet.luxds.net | 10.178.65.98?
No CM agent is working on it currently
Is it showing in CM portal?
Ans. on the CM, no entry as xl11lsrv0428.
Created 01-22-2019 05:40 PM
Hi @pra_big,
Please do not add onto a solved thread from 5 years ago. It is very unlikely that the current issue you face is identical so it is best to start a new conversation.
Please outline what you are trying to do, what you expect to have happen, and what is actually occurring.
From your description, it appears you are running a script that may be an Oracle script (mammoth). That is not a Cloudera Script, so please consult with the vendor that supplied you with "mammoth" if you are need assistance with it.
It is hard to tell what you are asking about with respect to the host in Cloudera Manager... if you want to delete a host in CM, Go to the Hosts tab, select "All Hosts". Then, find the host you wish to delete, check the box next to it and then choose "delete" from the drop-down menu.
maybe you could show screen shots or explain more about what you are having trouble with.
NOTE: when the Cloudera Manager Agent heartbeats to CM, CM identifies the host by "uuid" not hostname. So, if you re-imaged and accidentally reused a UUID from another host, that could lead to some confusion.
We need to clearly understand what problem you are seeing to provide the best help.