Reply
Highlighted
New Contributor
Posts: 7
Registered: ‎02-13-2014
Accepted Solution

can't delete bad node from the cluster

64-node cluster

one node is bad..no longer communicates 

I want to remove him from the cluster

 

   
ch-810.71.0.108/defaultCDH 5Cluster 1

2 Role(s)

Good Health12.38s ago  
ch-910.71.0.109/defaultUnknownCluster 1

2 Role(s)

Bad HealthNone

 

 

 

here he is in the hosts. he's a data node and a nodemanager (yarn)

 

When I try to delete, it tells me

Cloudera Employee
Posts: 508
Registered: ‎07-30-2013

Re: can't delete bad node from the cluster

Try stopping all roles on that host (via CM), then removing it. The stop commands will fail, but they'll mark the state as stopped so it'll let you remove it from the cluster.
New Contributor
Posts: 7
Registered: ‎02-13-2014

Re: can't delete bad node from the cluster

thanks for the quick response.

I thought I had tried that...but

I was showing someone else the problem and told him to just try it and he managed to delete the node just doing what I had done that failed. So maybe the node got in the desired state.

 

in any case, I can't try your recommendation immediately. 

However I think I'll get nodes into a bad state again sometime soon and will try what you recommend

 

thanks again

(sorry I can't confirm exactly right now, but my node is gone now which is good.)

 

-kevin

Cloudera Employee
Posts: 508
Registered: ‎07-30-2013

Re: can't delete bad node from the cluster

The decommission step may have done the same thing as the stop command I suggested. If this happens again, I'd try the decommission command, let it fail, then delete host. If that doesn't work, then try my stop suggestion.
Explorer
Posts: 13
Registered: ‎12-07-2018

Re: can't delete deleted node from the cluster CM

[ Edited ]

Remove the dead host/decommissioned host from mammoth -c output or CM.

 

We have already deleted the host.

As we are about to start the upgrade process from 5.14.2 to 6.0. So as a prerequisites,

When running ./mammoth -c it is giving information about the hosts which is not part of cluster. We are also thinking to remove it from scm database table hosts On mysql, under scm database, also I am able to see :

 

mysql> mysql> select * from HOSTS; +---------+-------------------------+--------------------------------------+-----------------------------+---------------+----------+--------+- | HOST_ID | OPTIMISTIC_LOCK_VERSION | HOST_IDENTIFIER | NAME | IP_ADDRESS | RACK_ID | STATUS | +---------+-------------------------+--------------------------------------+-----------------------------+---------------+----------+--------+- | 1 | 248 | 260772a1-a89a-42b8-af4c-0406ac0c21bd | bdk1n07.bnet.luxds.net | 192.168.11.16 | /default | NA | | 2 | 251 | 19103582-a94d-4961-aeb8-5a2023480fa5 | bdk1n09.bnet.luxds.net | 192.168.11.18 | /default | NA | | 3 | 254 | e57f3aa9-ab4f-4b3c-925d-2be272237928 | bdk1n08.bnet.luxds.net | 192.168.11.17 | /default | NA | | 4 | 89 | 0317c86d-b693-4280-ba25-0bbcc46e567c | xl11lsrv0428.bnet.luxds.net | 10.178.65.98 | /default | NA | +---------+-------------------------+--------------------------------------+-----------------------------+---------------+----------+--------+-

One with hostId "0317c86d-b693-4280-ba25-0bbcc46e567c"(which was edge node before) is removed from cloudera, so is there anyway to clean this node from CM, because on the screen of cloudera - hosts I am just able to see 3 nodes.

 

Is that server xl11lsrv0428.bnet.luxds.net | 10.178.65.98 still running separately? 
it is running seperately and even re-imaged.

Is CM agent still running /stopped on the server xl11lsrv0428.bnet.luxds.net | 10.178.65.98? 
No CM agent is working on it currently

Is it showing in CM portal? 
Ans. on the CM, no entry as xl11lsrv0428.

Posts: 1,003
Topics: 1
Kudos: 249
Solutions: 126
Registered: ‎04-22-2014

Re: can't delete deleted node from the cluster CM

Hi @pra_big,

 

Please do not add onto a solved thread from 5 years ago.  It is very unlikely that the current issue you face is identical so it is best to start a new conversation.

 

Please outline what you are trying to do, what you expect to have happen, and what is actually occurring.

 

From your description, it appears you are running a script that may be an Oracle script (mammoth).  That is not a Cloudera Script, so please consult with the vendor that supplied you with "mammoth" if you are need assistance with it.

 

It is hard to tell what you are asking about with respect to the host in Cloudera Manager... if you want to delete a host in CM, Go to the Hosts tab, select "All Hosts".  Then, find the host you wish to delete, check the box next to it and then choose "delete" from the drop-down menu.

 

maybe you could show screen shots or explain more about what you are having trouble with.

NOTE:  when the Cloudera Manager Agent heartbeats to CM, CM identifies the host by "uuid" not hostname.  So, if you re-imaged and accidentally reused a UUID from another host, that could lead to some confusion.

 

We need to clearly understand what problem you are seeing to provide the best help.

Announcements