Created 11-24-2017 12:42 PM
hi,
we create by mistake worker node ( worker11 with datanode & namenode ) , while cluster already have worker11
only when we start the new worker11 we notice that other node - worker11 already exsist
so the problem is that we erased the OS of worker11 ( without delete it from the cluster ) , and new worker - worker11 stay in the cluster
after couple hour we notice that heartbeat loses on the new worker11 and after some time worker11 crash , so we need to start this machine ( boot ) again and so on
it is clear that all the problem on worker11 are because this is duplicate machine , ( while the old worker11 machine was removed from the VM center ( OS )
so what is the workaround to do on worker11 node ? , in ordeer amabri cluster will get this machine without problems?
Created 11-24-2017 12:50 PM
Usually "ambari-agent" starts sending the heartbeat messages to the ambari-server when it is started.
So if you do not want your old "worker11" not to send registration request to the Ambari-Server then please stop the ambari-agent on that host. or remove the ambari-server hostname entry from the "/etc/ambari-agent/conf.ambari-agent.ini" file of that particular agent.
# ambari-agent stop # mv /etc/ambari-agent/confambari-agent.ini /etc/ambari-agent/confambari-agent.ini.unwanted # ambari-agent stop (just to make sure that stop is already performed)
.
If above does not work then as an alternate option we might need to delete the Old worker11 host entry from the DataBase directly.
Created 11-24-2017 12:50 PM
Usually "ambari-agent" starts sending the heartbeat messages to the ambari-server when it is started.
So if you do not want your old "worker11" not to send registration request to the Ambari-Server then please stop the ambari-agent on that host. or remove the ambari-server hostname entry from the "/etc/ambari-agent/conf.ambari-agent.ini" file of that particular agent.
# ambari-agent stop # mv /etc/ambari-agent/confambari-agent.ini /etc/ambari-agent/confambari-agent.ini.unwanted # ambari-agent stop (just to make sure that stop is already performed)
.
If above does not work then as an alternate option we might need to delete the Old worker11 host entry from the DataBase directly.
Created 11-24-2017 01:00 PM
hi Jay , the old worker11 not exists any more we removed the worker11 from the Vcenter - ( it was VM machine ), so we cant do any steps on the old host , only we have the new worker11
Created 11-24-2017 01:08 PM
@jay about "
delete the Old worker11 host entry from the DataBase" , I guess need to do it on the ambari-server , if true please then show me the steps , or if needed some extra steps please let me know what are they ,Created 11-24-2017 01:28 PM
We can find the host_id of the unwanted Host from Ambari DB and delete those selected hosts. (as we are making DB changes so please consider the risk and please collect the DB dump for backup)
Step0). Please make sure to collect a latest ambari DB dump. (THis step is must so that we can revert back to rpeviosu stage). **MUST DO**
.
Step1). Stop ambari-server.
# ambari-server stop
Step2). Please get their "host_id" of the unwanted host based on the hostname using the following SQL query.
select host_id from hosts where host_name='unwanted1.host.com';
Step3). Once using the above command we get the "host_id". then we know that the above host need to be deleted, So delete them as following:
NOTE: i am writing a dummy host_id as "999" (dummy ID) in the following query, pelase replace it according to the previous query results.
delete from execution_command where task_id in (select task_id from host_role_command where host_id in (999)); delete from host_version where host_id in (999); delete from host_role_command where host_id in (999); delete from serviceconfighosts where host_id in (999); delete from hoststate where host_id in (999); delete from kerberos_principal_host WHERE host_id='unwanted1.host.com'; ----> For kerberized Env delete from hosts where host_name in ('unwanted1.host.com'); delete from alert_current where history_id in ( select alert_id from alert_history where host_name in ('unwanted1.host.com'));
.
Step4). Start ambari Server
# ambari-server start
.
Created 11-24-2017 01:40 PM
Thank you JAY , I will try it on Sunday since I am out of work now ,
Created 11-24-2017 02:03 PM
Hi Jay , I just test the first command on my test ambari server machine --->
# su postgres
bash-4.2$ psql
postgres=# select host_id from hosts where host_name='worker11.sys67.com';
ERROR: relation "hosts" does not exist
LINE 1: select host_id from hosts where host_name='worker11.sys67... , ( what is wrong here ? )
Created 11-24-2017 05:41 PM
If your ambari DB name is "ambari" then you shoudl try this:
# psql -U ambari ambari Enter Password: bigdata ambari=> select host_id from hosts where host_name='worker11.sys67.com';
.
Created 11-25-2017 04:06 PM
@Jay , the old machine that was deleted and the new machine are with the same name ( worker11.sys55.com ) , so regarding to the procedure "unwanted1.host.com"" , then it will effected also the new current machine , I mean the new machine will be effected also ,
Created 11-26-2017 03:49 AM