Created 07-05-2017 04:09 PM
I just restarted my cluster (HDP 2.6) nodes for the first time. When I ran the
ambari-server start ambari-agent start
command, the Ambari UI didn't find a heartbeat from any host anymore!
When I called
curl -i -H "X-Requested-By: ambari" -u admin:mypassword -X GET http://localhost:8080/api/v1/hosts
I get the list of my hosts, but it contains two entries for each node (one is the pure hostname, and the other one the FQDN):
HTTP/1.1 200 OK X-Frame-Options: DENY X-XSS-Protection: 1; mode=block X-Content-Type-Options: nosniff Cache-Control: no-store Pragma: no-cache Set-Cookie: AMBARISESSIONID=7pm102h29dbu1670zcr416aq9;Path=/;HttpOnly Expires: Thu, 01 Jan 1970 00:00:00 GMT User: admin Content-Type: text/plain Vary: Accept-Encoding, User-Agent Content-Length: 1017 Server: Jetty(8.1.19.v20160209) { "href" : "http://localhost:8080/api/v1/hosts", "items" : [ { "href" : "http://localhost:8080/api/v1/hosts/hdp-1", "Hosts" : { "cluster_name" : "TestCluster", "host_name" : "hdp-1" } }, { "href" : "http://localhost:8080/api/v1/hosts/hdp-1.novalocal", "Hosts" : { "host_name" : "hdp-1.novalocal" } }, { "href" : "http://localhost:8080/api/v1/hosts/hdp-2", "Hosts" : { "cluster_name" : "TestCluster", "host_name" : "hdp-2" } }, { "href" : "http://localhost:8080/api/v1/hosts/hdp-2.novalocal", "Hosts" : { "host_name" : "hdp-2.novalocal" } }, { "href" : "http://localhost:8080/api/v1/hosts/hdp-3", "Hosts" : { "cluster_name" : "TestCluster", "host_name" : "hdp-3" } }, { "href" : "http://localhost:8080/api/v1/hosts/hdp-3.novalocal", "Hosts" : { "host_name" : "hdp-3.novalocal" } } ] }
This seems to confuse the ambari-server / ambari-agent, that I can't receive a heartbeat anymore! How can I solve this issue, my cluster is not usable anymore, as the services miss the heartbeat! Thank you!
Update: I just saw, that I set the hostname to hdp-x before the ambari-server install, e.g.:
sudo hostname hdp-1
When I restart the node(s), it has its "old" hostname again:
hdp1.novalocal
I just tried to make another "sudo hostname hdp-1" again, but it didn't help, is it because the ambari-server and ambari-agents start automatically after boot? Stopping and restarting the services after this "hostname hdp-1" command didn't help!
Created 07-05-2017 04:26 PM
This happens if you have changed the hostname of your cluster nodes after the ambari cluster installation. In thsi case because initially suppose the hostname was "hdp-3.novalocal" for the host so after starting the agent on that host it will be registered in the ambri DB with name "hdp-3.novalocal", But after few days if you will change the agent hostname to "hdp-3" then a new host will eb registered to the ambari cluster (even thoug the host is same but the hostname was different earlier) The stop the Ambari Server
# ambari-server stop
Please take ambari DB Dump.
# pg_dump -U ambari ambari > /tml/ambari_bkp.sql
.
We cleaned unwanted hosts from DB and delete those selected hosts. Get their "host_id" of those hosts which you want to clean. Connect to ambari DB.
# psql -U ambari ambari Password: bigdata
Queries: To find the host_id;
select host_id from hosts where host_name='hdp-1.novalocal'; select host_id from hosts where host_name='hdp-2.novalocal'; select host_id from hosts where host_name='hdp-3.novalocal';
.
Using the above command we will get the "host_id". As we know that the above hosts need to be deleted, So delete them as following. Suppose the host_id is respectively 111,222,333
delete from execution_command where task_id in (select task_id from host_role_command where host_id in (111,222,333)); delete from host_version where host_id in (111,222,333); delete from host_role_command where host_id in (111,222,333); delete from serviceconfighosts where host_id in (111,222,333); delete from hoststate where host_id in (111,222,333); delete from hosts where host_name in ('hdp-1.novalocal'); delete from hosts where host_name in ('hdp-2.novalocal'); delete from hosts where host_name in ('hdp-3.novalocal'); delete from alert_current where history_id in ( select alert_id from alert_history where host_name in ('hdp-1.novalocal')); delete from alert_current where history_id in ( select alert_id from alert_history where host_name in ('hdp-2.novalocal')); delete from alert_current where history_id in ( select alert_id from alert_history where host_name in ('hdp-3.novalocal'));
The restart Ambari Server
# ambari-server start
.
NOTE: Regarding the changing hostname issue I have written an article some time back, You should refer to the following article which explains why does it happen in Cloud environment and how to fix it.
.
Created 07-05-2017 04:26 PM
This happens if you have changed the hostname of your cluster nodes after the ambari cluster installation. In thsi case because initially suppose the hostname was "hdp-3.novalocal" for the host so after starting the agent on that host it will be registered in the ambri DB with name "hdp-3.novalocal", But after few days if you will change the agent hostname to "hdp-3" then a new host will eb registered to the ambari cluster (even thoug the host is same but the hostname was different earlier) The stop the Ambari Server
# ambari-server stop
Please take ambari DB Dump.
# pg_dump -U ambari ambari > /tml/ambari_bkp.sql
.
We cleaned unwanted hosts from DB and delete those selected hosts. Get their "host_id" of those hosts which you want to clean. Connect to ambari DB.
# psql -U ambari ambari Password: bigdata
Queries: To find the host_id;
select host_id from hosts where host_name='hdp-1.novalocal'; select host_id from hosts where host_name='hdp-2.novalocal'; select host_id from hosts where host_name='hdp-3.novalocal';
.
Using the above command we will get the "host_id". As we know that the above hosts need to be deleted, So delete them as following. Suppose the host_id is respectively 111,222,333
delete from execution_command where task_id in (select task_id from host_role_command where host_id in (111,222,333)); delete from host_version where host_id in (111,222,333); delete from host_role_command where host_id in (111,222,333); delete from serviceconfighosts where host_id in (111,222,333); delete from hoststate where host_id in (111,222,333); delete from hosts where host_name in ('hdp-1.novalocal'); delete from hosts where host_name in ('hdp-2.novalocal'); delete from hosts where host_name in ('hdp-3.novalocal'); delete from alert_current where history_id in ( select alert_id from alert_history where host_name in ('hdp-1.novalocal')); delete from alert_current where history_id in ( select alert_id from alert_history where host_name in ('hdp-2.novalocal')); delete from alert_current where history_id in ( select alert_id from alert_history where host_name in ('hdp-3.novalocal'));
The restart Ambari Server
# ambari-server start
.
NOTE: Regarding the changing hostname issue I have written an article some time back, You should refer to the following article which explains why does it happen in Cloud environment and how to fix it.
.
Created 07-05-2017 04:38 PM
Thank you @Jay SenSharma for the very fast answer. I just found out, that the wrong hostnames come from the ambari-agents! I did a
curl ... DELETE .../hosts/hdp-1.novalocal
and everything looked good afterwards (stopped the agents). When I restarted the agents the FQDNs were there in the GET request again! Can I still use your solution for that or is it another problem?
Another question: Is there a possibility to set the hostname, that the agents take? Or which information do they use? (file /etc/hostname / hostname / hostname -f)? As I'm not the Linux expert I'm thankful for each help.
Created 07-05-2017 04:42 PM
Yes, Deleting unwanted hosts using Ambari API is also correct option. The DB queries just cleans all the unwanted information's of old hostnames from the DB completely. But both options are valid/good.
.
Regarding changing the agent host name permanently.
** Permanently fix the public hostname: (Recommended) 1. Create a file with name : "/var/lib/ambari-agent/public_hostname.sh" then in that file add the following line:
#!/bin/sh echo `hostname -f`
2. Make sure that the file "/var/lib/ambari-agent/public_hostname.sh" has proper execute permission. Example:
chmod 755 "/var/lib/ambari-agent/public_hostname.sh"
3. On every ambari-agent host edit the file "/etc/ambari-agent/conf/ambari-agent.ini" and in the [agent] section add the following line:
## Added following to customize the public hostname public_hostname_script=/var/lib/ambari-agent/public_hostname.sh
NOTE: Users can also use the property "hostname_script" to customize the internal hostname. 3. Make sure that the changes are pushed to all the hosts present in the ambari cluster. 4. Now restart the agents.
ambari-agent restart
.
Created 07-05-2017 04:45 PM
For more informations please refer to: https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.0.0/bk_ambari-reference/content/how_to_customiz...
hostname_script=/var/lib/ambari-agent/hostname.sh public_hostname_script=/var/lib/ambari-agent/public_hostname.sh
Created 07-06-2017 06:41 AM
Removing the "hdp-1.novalocal" from the hosts list and using the hostname script for setting the public / private hostname did it for me! Thank you so much, I think you saved my whole week!