Support Questions
Find answers, ask questions, and share your expertise

AMbari heart beat lost

Rising Star
  • Not sure what happened but Ambari serveer is showing heart beat lost for all the hosts. I tried restarting the ambari server and agent but no help.
1 ACCEPTED SOLUTION

Accepted Solutions

Super Collaborator

You'll need to re-generate certificates on the Ambari Server since they are expired:

https://community.hortonworks.com/articles/68799/steps-to-fix-ambari-server-agent-expired-certs.html

View solution in original post

8 REPLIES 8

Super Mentor

@Prakash Punj

From the ambari agent host try to access the ambari server's following port to findout if the Ambari's Heartbeat port is accessible to the agent or not?

telnet  $AMBARI_HOSTNAME   8441

.

The "/etc/ambari-agent/conf/ambari-agent.ini" file should have the mentioned host & port information of server.

Also check the ambari-agent.log to see if there are any strange errors. (please share if any error/warning).

Also check if there is any SSL issue

openssl s_client -connect $AMBARI_SERVER_HOSTNAME:8441 
openssl s_client -connect $AMBARI_SERVER_HOSTNAME:8440

Super Collaborator

Heartbeats can be lost if an exception occurs while Ambari Server is handling the heartbeat. It can also happen if there is an SSL certificate issue between server and agent. Can you please attach the ambari-server log and a log from the ambari-agent?

Rising Star
Stopping ambari-agent
Removing PID file at /var/run/ambari-agent/ambari-agent.pid
ambari-agent successfully stopped
[root@Namenode ~]# ambari-agent start
Verifying Python version compatibility...
Using python  /usr/bin/python
Checking for previously running Ambari Agent...
Starting ambari-agent
Verifying ambari-agent process status...
Ambari Agent successfully started
Agent PID at: /var/run/ambari-agent/ambari-agent.pid
Agent out at: /var/log/ambari-agent/ambari-agent.out
Agent log at: /var/log/ambari-agent/ambari-agent.log
[root@Namenode ~]# vi /var/log/ambari-agent/ambari-agent.log
INFO 2017-05-25 07:22:07,809 NetUtil.py:60 - Connecting to https://ambari.asotc.com:8440/connection_info
INFO 2017-05-25 07:22:07,976 security.py:54 - Server require two-way SSL authentication. Use it instead of one-way...
INFO 2017-05-25 07:22:07,976 security.py:188 - Server certicate exists, ok
INFO 2017-05-25 07:22:07,977 security.py:196 - Agent key exists, ok
INFO 2017-05-25 07:22:07,977 security.py:204 - Agent certificate exists, ok
INFO 2017-05-25 07:22:07,977 security.py:99 - SSL Connect being called.. connecting to the server
ERROR 2017-05-25 07:22:08,111 security.py:86 - Two-way SSL authentication failed. Ensure that server and agent certificates were signed by the same CA and restart the agent.
In order to receive a new agent certificate, remove existing certificate file from keys directory. As a workaround you can turn off two-way SSL authentication in server configuration(ambari.properties)
Exiting..
ERROR 2017-05-25 07:22:08,112 Controller.py:350 - Unable to reconnect to https://ambari.asotc.com:8441/agent/v1/heartbeat/namenode.asotc.com (attempts=1699, details=Request to https://ambari.asotc.com:8441/agent/v1/heartbeat/namenode.asotc.com failed due to [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:765))
INFO 2017-05-25 07:22:23,014 NetUtil.py:60 - Connecting to https://ambari.asotc.com:8440/connection_info

Looks like its connecting to SSL. I have not enabled SSL

Rising Star
---
SSL handshake has read 2303 bytes and written 206 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Server public key is 4096 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES256-GCM-SHA384
    Session-ID: 5926D3A2802AC3DD04F3CD1BA946AFAA8A19EACE5EE04A59AB752ACD63AC55A8
    Session-ID-ctx:
    Master-Key: D205A7AD2A675D7E61B56E0A1A28AC76E5BCCE249CB7A50F4461F5C3EF12D3C9106EAB0B68146BDC5F97849CADDCAFF9
    Key-Arg   : None
    Krb5 Principal: None
    PSK identity: None
    PSK identity hint: None
    Start Time: 1495716767
    Timeout   : 300 (sec)
    Verify return code: 10 (certificate has expired)


Rising Star

Looks like SSL cert is the issue on ambari-server. It expired. How can I renew it

Super Collaborator

You'll need to re-generate certificates on the Ambari Server since they are expired:

https://community.hortonworks.com/articles/68799/steps-to-fix-ambari-server-agent-expired-certs.html

View solution in original post

Rising Star

This works. So its gonna happen once every year. Is there a solution for that.

Super Mentor

@Prakash Punj

1. stop ambari-server 2. take a back of existing /var/lib/ambari-server/keys folder and empty it. 3. download the attached keys.zip file and copy it to /var/lib/ambari-server/ , your new folder structure should be like /var/lib/ambari-server/keys/ca.config,/var/lib/ambari-server/keys/db/, - basically this is a fresh keys folder ( this is what you get when you install ambari-server ) 4. Take a back up of all the Agent certs located at /var/lib/ambari-agent/keys/ in all the hosts. 5. Delete all the files under /var/lib/ambari-agent/keys/ folder 6. restart ambari-server. Note: ambari-server should create new certs under /var/lib/ambari-server/keys/ca.crt , /var/lib/ambari-server/keys/ca.key .... 7. restart ambari-agent, Note: ambari-agent should create new certs under /var/lib/ambari-server/keys/ folder now you should see the successful heart beat from all the Agents.