During HDP installation we get the following on ambari-server log
2019-10-29 12:29:58,538 [CRITICAL] [HARD] [AMBARI] [ambari_server_agent_heartbeat] (Ambari Agent Heartbeat) worker04.sys65.com is not sending heartbeats
2019-10-29 12:29:58,540 [CRITICAL] [HARD] [AMBARI] [ambari_server_agent_heartbeat] (Ambari Agent Heartbeat) worker01.sys65.com is not sending heartbeats
2019-10-29 12:45:58,543 [CRITICAL] [HARD] [AMBARI] [ambari_server_agent_heartbeat] (Ambari Agent Heartbeat) worker05.sys65.com is not sending heartbeats
2019-10-29 12:47:58,542 [CRITICAL] [HARD] [AMBARI] [ambari_server_agent_heartbeat] (Ambari Agent Heartbeat) worker03.sys65.com is not sending heartbeats
as we understand the workers above seems that have connection issue against the ambari-server
or some other problem
we don't know why this happens , we can see that worker02 succeeded while other workers failed on installation and loose the heartbeat
note - on all workers machines ambari-agent running
second - is it logical that installation failed from the workers because ambari agent?
The only way ambari server can be made aware of any node in the cluster is when the agent sends heartbeats to announce itself, devoid of that Ambari is in the dark.
Having said that we usually forget the obvious so I would ask you to go through the checklist of preparing the HDP environment once you have that confirmed then you can narrow the investigations to some proxy or network firewall.
Another way around the problem is to install the agent on all the nodes including the Ambari server and edit the ambari-agent.ini on all the host to point to the Ambari server in the below example the ambari server FQDN is ambari.test.com you can obtain that by running the below command the ambari server
$ hostname -f
use that name in the below ambari-agent.ini
[server] hostname=[ambari.test.com] url_port=8440 secured_url_port=8441
One that is done start all the agents all host and fire up ambari server and choose the manual installation and you won't need the ssh keys 🙂
That should work out now talking about ssh keys did you test the passwordless connection between the hosts before launching the Ambari 🙂
what I can say for now is that ( for machine that have the problem )
DNS is OK
firewall is off
iptables is off
ambari-agent.ini configured right
machine is according to checklist
so from this point - can you maybe advice on other options that could be?
another - note
according to check list - they using NTP service while on redhat 7.2 we are using chrony service
do you recommended to move to ntp service ( /etc/ntp.conf ) instead chrony service ( chrony.conf ) ?
Chronyd replaces ntpd in RHEL 7 ntpd as the default network time protocol daemon. ... Chrony is a different implementation of the network time protocol (NTP) than the network time protocol daemon (ntpd) that is able to synchronize the system clock faster and with better accuracy than ntpd.
You should only consider NTP daemon for systems that are permanently on and required to use broadcast or multicast IP, or to perform authentication of packets with the Autokey protocol.
# systemctl enable chronyd
# systemctl start chronyd
You didn't answer whether you configured the passwordless connection between the nodes?
Could you try this solution I would suggest on all the node you backup the ambari-agent.ini
# cp /etc/ambari-agent/conf/ambari-agent.ini /etc/ambari-agent/conf/ambari-agent.ini.ORIG
Then proceed with the below steps
Empty the cache.
Check for errors.xxxx.txt see location below use the ls -lrt to sort out the latest
Under [network] in ambari-agent.ini
can you edit the use_system_proxy_settings=true to false
Restart the agents
about - " passwordless connection between the nodes"
if you means about no password login between ambari server to all other ambari agents
then yes - we send the public keys to all machines in order to give SSH without password entering
do you think - use_system_proxy_settings=true , can give negative affect on ambari agents?
so your final answer about time sync is that we need to choose one of the service - NTP / Chrony , and hortonworks not prefer specific service ( correct me if I am wrong )
also doc say
On an installation host running RHEL/CentOS with PackageKit installed, open /etc/yum/pluginconf.d/refresh-packagekit.conf using a text editor. Make the following change:
but we not have the file - refresh-packagekit.conf ,
on any redhat 7.2 OS machine