- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
create by mistake worker duplicate node
- Labels:
-
Apache Ambari
-
Apache Hadoop
Created ‎11-24-2017 12:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi,
we create by mistake worker node ( worker11 with datanode & namenode ) , while cluster already have worker11
only when we start the new worker11 we notice that other node - worker11 already exsist
so the problem is that we erased the OS of worker11 ( without delete it from the cluster ) , and new worker - worker11 stay in the cluster
after couple hour we notice that heartbeat loses on the new worker11 and after some time worker11 crash , so we need to start this machine ( boot ) again and so on
it is clear that all the problem on worker11 are because this is duplicate machine , ( while the old worker11 machine was removed from the VM center ( OS )
so what is the workaround to do on worker11 node ? , in ordeer amabri cluster will get this machine without problems?
Created ‎11-24-2017 12:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Usually "ambari-agent" starts sending the heartbeat messages to the ambari-server when it is started.
So if you do not want your old "worker11" not to send registration request to the Ambari-Server then please stop the ambari-agent on that host. or remove the ambari-server hostname entry from the "/etc/ambari-agent/conf.ambari-agent.ini" file of that particular agent.
# ambari-agent stop # mv /etc/ambari-agent/confambari-agent.ini /etc/ambari-agent/confambari-agent.ini.unwanted # ambari-agent stop (just to make sure that stop is already performed)
.
If above does not work then as an alternate option we might need to delete the Old worker11 host entry from the DataBase directly.
Created ‎11-24-2017 12:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Usually "ambari-agent" starts sending the heartbeat messages to the ambari-server when it is started.
So if you do not want your old "worker11" not to send registration request to the Ambari-Server then please stop the ambari-agent on that host. or remove the ambari-server hostname entry from the "/etc/ambari-agent/conf.ambari-agent.ini" file of that particular agent.
# ambari-agent stop # mv /etc/ambari-agent/confambari-agent.ini /etc/ambari-agent/confambari-agent.ini.unwanted # ambari-agent stop (just to make sure that stop is already performed)
.
If above does not work then as an alternate option we might need to delete the Old worker11 host entry from the DataBase directly.
Created ‎11-24-2017 01:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi Jay , the old worker11 not exists any more we removed the worker11 from the Vcenter - ( it was VM machine ), so we cant do any steps on the old host , only we have the new worker11
Created ‎11-24-2017 01:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@jay about "
delete the Old worker11 host entry from the DataBase" , I guess need to do it on the ambari-server , if true please then show me the steps , or if needed some extra steps please let me know what are they ,Created ‎11-24-2017 01:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We can find the host_id of the unwanted Host from Ambari DB and delete those selected hosts. (as we are making DB changes so please consider the risk and please collect the DB dump for backup)
Step0). Please make sure to collect a latest ambari DB dump. (THis step is must so that we can revert back to rpeviosu stage). **MUST DO**
.
Step1). Stop ambari-server.
# ambari-server stop
Step2). Please get their "host_id" of the unwanted host based on the hostname using the following SQL query.
select host_id from hosts where host_name='unwanted1.host.com';
Step3). Once using the above command we get the "host_id". then we know that the above host need to be deleted, So delete them as following:
NOTE: i am writing a dummy host_id as "999" (dummy ID) in the following query, pelase replace it according to the previous query results.
delete from execution_command where task_id in (select task_id from host_role_command where host_id in (999)); delete from host_version where host_id in (999); delete from host_role_command where host_id in (999); delete from serviceconfighosts where host_id in (999); delete from hoststate where host_id in (999); delete from kerberos_principal_host WHERE host_id='unwanted1.host.com'; ----> For kerberized Env delete from hosts where host_name in ('unwanted1.host.com'); delete from alert_current where history_id in ( select alert_id from alert_history where host_name in ('unwanted1.host.com'));
.
Step4). Start ambari Server
# ambari-server start
.
Created ‎11-24-2017 01:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you JAY , I will try it on Sunday since I am out of work now ,
Created ‎11-24-2017 02:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jay , I just test the first command on my test ambari server machine --->
# su postgres
bash-4.2$ psql
postgres=# select host_id from hosts where host_name='worker11.sys67.com';
ERROR: relation "hosts" does not exist
LINE 1: select host_id from hosts where host_name='worker11.sys67... , ( what is wrong here ? )
Created ‎11-24-2017 05:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If your ambari DB name is "ambari" then you shoudl try this:
# psql -U ambari ambari Enter Password: bigdata ambari=> select host_id from hosts where host_name='worker11.sys67.com';
.
Created ‎11-25-2017 04:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Jay , the old machine that was deleted and the new machine are with the same name ( worker11.sys55.com ) , so regarding to the procedure "unwanted1.host.com"" , then it will effected also the new current machine , I mean the new machine will be effected also ,
Created ‎11-26-2017 03:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
