Support Questions
Find answers, ask questions, and share your expertise

HDP 2.3, Ambari 2.1.2 All hosts show heartbeat lost

Explorer

Hi, I have 11 hosts and all of them are showing heartbeat lost and the icons are Yellow. I restarted the ambari-server and agents on all the hosts but the hosts are still in this state. Any idea what is going on.

1 ACCEPTED SOLUTION

Accepted Solutions

Explorer

I was able to resolve this issue. Now the services are coming up

View solution in original post

10 REPLIES 10

@Krish Khambadkone

- Can you please check the "/var/log/ambari-agent/ambari-agent.log" to see if it is showing any error? Please share if any strange warning/error noticed.

- Also please check the "/var/log/ambari-server/ambari-server.log" to see if it is showing any error? Please share if any strange warning/error noticed.

- Were they (ambari-server & ambari-agent) communicating earlier?

- Are you able to run the following command without any issue from the ambari-agent machine to connect to ambari-server on port 8440

openssl s_client -connect AmbariServerHostName:8440

- Also have you setup the password less ssh between ambari-server and ambari agent machines?

- Do you see that ambari agents are returning the correct hostname when you run the following command:

hostname  -f

.

Explorer

Hi, I ran this command from a ambari-agent host and it did connect successfully and yes they are showing the right hostname.

CONNECTED(00000003)

depth=0 C = XX, L = Default City, O = Default Company Ltd

verify error:num=18:self signed certificate

verify return:1

depth=0 C = XX, L = Default City, O = Default Company Ltd

verify return:1

---

Certificate chain

0 s:/C=XX/L=Default City/O=Default Company Ltd

i:/C=XX/L=Default City/O=Default Company Ltd

---

Server certificate

-----BEGIN CERTIFICATE-----

MIIFnDCCA4SgAwIBAgIBATANBgkqhkiG9w0BAQQFADBCMQswCQYDVQQGEwJYWDEV

MBMGA1UEBwwMRGVmYXVsdCBDaXR5MRwwGgYDVQQKDBNEZWZhdWx0IENvbXBhbnkg

THRkMB4XDTE2MTIwMjAwMTkyMFoXDTE3MTIwMjAwMTkyMFowQjELMAkGA1UEBhMC

WFgxFTATBgNVBAcMDERlZmF1bHQgQ2l0eTEcMBoGA1UECgwTRGVmYXVsdCBDb21w

YW55IEx0ZDCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAMUTB1zUpQVn

x0aLvmgtKMv9OgRqeeahhDMPkeUiX3XH46tMVGM6KIQViycQVYUS99op5wxB/fah

8GFxu7byObpC2wCFTnFKy0lO7fn48/ls18PWI94Nk6ciqlC2qPfwyJTOFBP4jhHf

4rPqISKY+N6C5s4JgxAI5i7U1qPC+ZX3Y2BojtKBsJlwbMCL1Qqa1B6z8sVGP+P4

iUfRlMunUPEAyvGp94PXdYFtrAxA14HV2/T1GeYBV7driuXY2nuTbPR9zR/bxCMV

feHs9v8P56B2UC8VyCxT1H0hahZY3tq2pzlpFiKR3IdLyeNCiy+M1bfrYV0Y3hV0

IyJyl2kw4F9/f6pEAThvE1arMSVg18wKQbCaDIOCQVrogMuwaNAcLEnc8EWkVQgZ

LpqCFJ2vKFbequ7wtTkFXZavYm8aPEhZmnY/WGPxSLvWzAcJXETdt04KHydM2hN5

FFaD2m+H4z8ir0tiORAX1FyjwGwOX3fklzcGOJQGfabGvkxKRQqwfxnKBXpPv/kw

uCey/sco6RALi3HyXc3dQ9JgBQM56I4vvsCoPcOzsDYyheRQnG4WJQVZYuIS9qAE

ZJ/AZuSLFaiHrV5PvVWb6MfQcDPCqpYQKv6UXG90q0b3MI+RNKkipIHr3BlVzTKk

PCS+vRMRcQHYn5merAx+oW+rV4cB9blFAgMBAAGjgZwwgZkwHQYDVR0OBBYEFFt2

4Uhjkf/2KiJpNz6PIXKRq+EsMGoGA1UdIwRjMGGAFFt24Uhjkf/2KiJpNz6PIXKR

@Krish Khambadkone

Can you please try the following steps as well:

1. ambari-agent stop

2. Ensure that /var/run/ambari-agent/ambari-agent.pid doesn't exist. Delete that file if exist.

3. Put the ambari-agent log in tail.

   tail -f /var/log/ambari-agent/ambari-agent.log 

4. ambari-agent start

5. Check /var/run/ambari-agent/ambari-agent.pid and ensure that this is the process which is running for agent

6. In the ambari-agent log if you find any warning/error then please share.

@Krish Khambadkone

Also sorry for confusion here the Registration and Heartbeat Port for Ambari Agents to Ambari Server is 8441 (not 8440) so can you please rerun the same openssl command with 8441 port this time from the ambari agent host?

 openssl s_client -connect AmbariServerHost:8441

.

Explorer

I will try the above steps but I noticed this in the ambari-agent.log on the namenode host

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo} span.s1 {font-variant-ligatures: no-common-ligatures}

INFO 2016-12-03 12:25:50,343 Heartbeat.py:78 - Building Heartbeat: {responseId = 178, timestamp = 1480796750343, commandsInProgress = False, componentsMapped = False}

Explorer

when I stop the ambari-agent the pid file in this location vanishes and it gets recreated when I start it again. In the log file, I see these messages,

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo} span.s1 {font-variant-ligatures: no-common-ligatures}

INFO 2016-12-03 12:28:22,523 AmbariConfig.py:260 - Updating config property (agent.check.remote.mounts) with value (false)

INFO 2016-12-03 12:28:22,523 AmbariConfig.py:260 - Updating config property (agent.auto.cache.update) with value (true)

INFO 2016-12-03 12:28:22,523 AmbariConfig.py:260 - Updating config property (agent.check.mounts.timeout) with value (0)

WARNING 2016-12-03 12:28:22,524 AlertSchedulerHandler.py:91 - There are no alert definition commands in the heartbeat; unable to update definitions

INFO 2016-12-03 12:28:22,524 Controller.py:387 - Registration response from vnode-109-16.rb.com was OK

INFO 2016-12-03 12:28:22,524 Controller.py:392 - Resetting ActionQueue...

INFO 2016-12-03 12:28:32,534 Heartbeat.py:78 - Building Heartbeat: {responseId = 0, timestamp = 1480796912534, commandsInProgress = False, componentsMapped = False}

INFO 2016-12-03 12:28:32,622 Controller.py:260 - Heartbeat response received (id = 1)

INFO 2016-12-03 12:28:42,622 Heartbeat.py:78 - Building Heartbeat: {responseId = 1, timestamp = 1480796922622, commandsInProgress = False, componentsMapped = False}

INFO 2016-12-03 12:28:42,664 Controller.py:260 - Heartbeat response received (id = 2)

Explorer

I was able to resolve this issue. Now the services are coming up

View solution in original post

@Krish Khambadkone

Can you please share your findings, What was the issue and how did you resolve it?

New Contributor

can you please share what you did to solve this issue ??