Support Questions

Find answers, ask questions, and share your expertise

create by mistake worker duplicate node

avatar

hi,

we create by mistake worker node ( worker11 with datanode & namenode ) , while cluster already have worker11

only when we start the new worker11 we notice that other node - worker11 already exsist

so the problem is that we erased the OS of worker11 ( without delete it from the cluster ) , and new worker - worker11 stay in the cluster

after couple hour we notice that heartbeat loses on the new worker11 and after some time worker11 crash , so we need to start this machine ( boot ) again and so on

it is clear that all the problem on worker11 are because this is duplicate machine , ( while the old worker11 machine was removed from the VM center ( OS )

so what is the workaround to do on worker11 node ? , in ordeer amabri cluster will get this machine without problems?

Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Michael Bronson

Usually "ambari-agent" starts sending the heartbeat messages to the ambari-server when it is started.

So if you do not want your old "worker11" not to send registration request to the Ambari-Server then please stop the ambari-agent on that host. or remove the ambari-server hostname entry from the "/etc/ambari-agent/conf.ambari-agent.ini" file of that particular agent.

# ambari-agent stop
# mv /etc/ambari-agent/confambari-agent.ini  /etc/ambari-agent/confambari-agent.ini.unwanted
# ambari-agent stop        (just to make sure that stop is already performed)

.

If above does not work then as an alternate option we might need to delete the Old worker11 host entry from the DataBase directly.

View solution in original post

10 REPLIES 10

avatar
Master Mentor

@Michael Bronson

Usually "ambari-agent" starts sending the heartbeat messages to the ambari-server when it is started.

So if you do not want your old "worker11" not to send registration request to the Ambari-Server then please stop the ambari-agent on that host. or remove the ambari-server hostname entry from the "/etc/ambari-agent/conf.ambari-agent.ini" file of that particular agent.

# ambari-agent stop
# mv /etc/ambari-agent/confambari-agent.ini  /etc/ambari-agent/confambari-agent.ini.unwanted
# ambari-agent stop        (just to make sure that stop is already performed)

.

If above does not work then as an alternate option we might need to delete the Old worker11 host entry from the DataBase directly.

avatar

hi Jay , the old worker11 not exists any more we removed the worker11 from the Vcenter - ( it was VM machine ), so we cant do any steps on the old host , only we have the new worker11

Michael-Bronson

avatar

@jay about "

delete the Old worker11 host entry from the DataBase" , I guess need to do it on the ambari-server , if true please then show me the steps , or if needed some extra steps please let me know what are they ,
Michael-Bronson

avatar
Master Mentor

@Michael Bronson

We can find the host_id of the unwanted Host from Ambari DB and delete those selected hosts. (as we are making DB changes so please consider the risk and please collect the DB dump for backup)

Step0). Please make sure to collect a latest ambari DB dump. (THis step is must so that we can revert back to rpeviosu stage). **MUST DO**

.

Step1). Stop ambari-server.

# ambari-server stop

Step2). Please get their "host_id" of the unwanted host based on the hostname using the following SQL query.

select host_id from hosts where host_name='unwanted1.host.com';

Step3). Once using the above command we get the "host_id". then we know that the above host need to be deleted, So delete them as following:

NOTE: i am writing a dummy host_id as "999" (dummy ID) in the following query, pelase replace it according to the previous query results.

delete from execution_command where task_id in (select task_id from host_role_command where host_id in (999));
delete from host_version where host_id in (999);
delete from host_role_command where host_id in (999);
delete from serviceconfighosts where host_id in (999);
delete from hoststate where host_id in (999);
delete from kerberos_principal_host WHERE host_id='unwanted1.host.com';  ----> For kerberized Env
delete from hosts where host_name in ('unwanted1.host.com');
delete from alert_current where history_id in ( select alert_id from alert_history where host_name in ('unwanted1.host.com'));

.

Step4). Start ambari Server

# ambari-server start

.


avatar

Thank you JAY , I will try it on Sunday since I am out of work now ,

Michael-Bronson

avatar

Hi Jay , I just test the first command on my test ambari server machine --->

# su postgres

bash-4.2$ psql

postgres=# select host_id from hosts where host_name='worker11.sys67.com';

ERROR: relation "hosts" does not exist

LINE 1: select host_id from hosts where host_name='worker11.sys67... , ( what is wrong here ? )

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

If your ambari DB name is "ambari" then you shoudl try this:

# psql -U ambari ambari
Enter Password: bigdata

ambari=>  select host_id from hosts where host_name='worker11.sys67.com';

.

avatar

@Jay , the old machine that was deleted and the new machine are with the same name ( worker11.sys55.com ) , so regarding to the procedure "unwanted1.host.com"" , then it will effected also the new current machine , I mean the new machine will be effected also ,

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

The "hosts" table of ambari DB also shows the IP Address of the registered host. So you should be able to find the difference between those two hosts. Also you can refer to other relevant tables like hostcomponentdesiredstate etc to differentiate between the hosts.