Created 12-03-2016 08:02 PM
Hi, I have 11 hosts and all of them are showing heartbeat lost and the icons are Yellow. I restarted the ambari-server and agents on all the hosts but the hosts are still in this state. Any idea what is going on.
Created 12-04-2016 02:03 AM
I was able to resolve this issue. Now the services are coming up
Created 12-03-2016 08:07 PM
- Can you please check the "/var/log/ambari-agent/ambari-agent.log" to see if it is showing any error? Please share if any strange warning/error noticed.
- Also please check the "/var/log/ambari-server/ambari-server.log" to see if it is showing any error? Please share if any strange warning/error noticed.
- Were they (ambari-server & ambari-agent) communicating earlier?
- Are you able to run the following command without any issue from the ambari-agent machine to connect to ambari-server on port 8440
openssl s_client -connect AmbariServerHostName:8440
- Also have you setup the password less ssh between ambari-server and ambari agent machines?
- Do you see that ambari agents are returning the correct hostname when you run the following command:
hostname -f
.
Created 12-03-2016 08:10 PM
Hi, I ran this command from a ambari-agent host and it did connect successfully and yes they are showing the right hostname.
CONNECTED(00000003)
depth=0 C = XX, L = Default City, O = Default Company Ltd
verify error:num=18:self signed certificate
verify return:1
depth=0 C = XX, L = Default City, O = Default Company Ltd
verify return:1
---
Certificate chain
0 s:/C=XX/L=Default City/O=Default Company Ltd
i:/C=XX/L=Default City/O=Default Company Ltd
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIFnDCCA4SgAwIBAgIBATANBgkqhkiG9w0BAQQFADBCMQswCQYDVQQGEwJYWDEV
MBMGA1UEBwwMRGVmYXVsdCBDaXR5MRwwGgYDVQQKDBNEZWZhdWx0IENvbXBhbnkg
THRkMB4XDTE2MTIwMjAwMTkyMFoXDTE3MTIwMjAwMTkyMFowQjELMAkGA1UEBhMC
WFgxFTATBgNVBAcMDERlZmF1bHQgQ2l0eTEcMBoGA1UECgwTRGVmYXVsdCBDb21w
YW55IEx0ZDCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAMUTB1zUpQVn
x0aLvmgtKMv9OgRqeeahhDMPkeUiX3XH46tMVGM6KIQViycQVYUS99op5wxB/fah
8GFxu7byObpC2wCFTnFKy0lO7fn48/ls18PWI94Nk6ciqlC2qPfwyJTOFBP4jhHf
4rPqISKY+N6C5s4JgxAI5i7U1qPC+ZX3Y2BojtKBsJlwbMCL1Qqa1B6z8sVGP+P4
iUfRlMunUPEAyvGp94PXdYFtrAxA14HV2/T1GeYBV7driuXY2nuTbPR9zR/bxCMV
feHs9v8P56B2UC8VyCxT1H0hahZY3tq2pzlpFiKR3IdLyeNCiy+M1bfrYV0Y3hV0
IyJyl2kw4F9/f6pEAThvE1arMSVg18wKQbCaDIOCQVrogMuwaNAcLEnc8EWkVQgZ
LpqCFJ2vKFbequ7wtTkFXZavYm8aPEhZmnY/WGPxSLvWzAcJXETdt04KHydM2hN5
FFaD2m+H4z8ir0tiORAX1FyjwGwOX3fklzcGOJQGfabGvkxKRQqwfxnKBXpPv/kw
uCey/sco6RALi3HyXc3dQ9JgBQM56I4vvsCoPcOzsDYyheRQnG4WJQVZYuIS9qAE
ZJ/AZuSLFaiHrV5PvVWb6MfQcDPCqpYQKv6UXG90q0b3MI+RNKkipIHr3BlVzTKk
PCS+vRMRcQHYn5merAx+oW+rV4cB9blFAgMBAAGjgZwwgZkwHQYDVR0OBBYEFFt2
4Uhjkf/2KiJpNz6PIXKRq+EsMGoGA1UdIwRjMGGAFFt24Uhjkf/2KiJpNz6PIXKR
Created 12-03-2016 08:16 PM
Can you please try the following steps as well:
1. ambari-agent stop
2. Ensure that /var/run/ambari-agent/ambari-agent.pid doesn't exist. Delete that file if exist.
3. Put the ambari-agent log in tail.
tail -f /var/log/ambari-agent/ambari-agent.log
4. ambari-agent start
5. Check /var/run/ambari-agent/ambari-agent.pid and ensure that this is the process which is running for agent
6. In the ambari-agent log if you find any warning/error then please share.
Created 12-03-2016 08:19 PM
Created 12-03-2016 08:27 PM
I will try the above steps but I noticed this in the ambari-agent.log on the namenode host
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo} span.s1 {font-variant-ligatures: no-common-ligatures}
INFO 2016-12-03 12:25:50,343 Heartbeat.py:78 - Building Heartbeat: {responseId = 178, timestamp = 1480796750343, commandsInProgress = False, componentsMapped = False}
Created 12-03-2016 08:30 PM
when I stop the ambari-agent the pid file in this location vanishes and it gets recreated when I start it again. In the log file, I see these messages,
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo} span.s1 {font-variant-ligatures: no-common-ligatures}
INFO 2016-12-03 12:28:22,523 AmbariConfig.py:260 - Updating config property (agent.check.remote.mounts) with value (false)
INFO 2016-12-03 12:28:22,523 AmbariConfig.py:260 - Updating config property (agent.auto.cache.update) with value (true)
INFO 2016-12-03 12:28:22,523 AmbariConfig.py:260 - Updating config property (agent.check.mounts.timeout) with value (0)
WARNING 2016-12-03 12:28:22,524 AlertSchedulerHandler.py:91 - There are no alert definition commands in the heartbeat; unable to update definitions
INFO 2016-12-03 12:28:22,524 Controller.py:387 - Registration response from vnode-109-16.rb.com was OK
INFO 2016-12-03 12:28:22,524 Controller.py:392 - Resetting ActionQueue...
INFO 2016-12-03 12:28:32,534 Heartbeat.py:78 - Building Heartbeat: {responseId = 0, timestamp = 1480796912534, commandsInProgress = False, componentsMapped = False}
INFO 2016-12-03 12:28:32,622 Controller.py:260 - Heartbeat response received (id = 1)
INFO 2016-12-03 12:28:42,622 Heartbeat.py:78 - Building Heartbeat: {responseId = 1, timestamp = 1480796922622, commandsInProgress = False, componentsMapped = False}
INFO 2016-12-03 12:28:42,664 Controller.py:260 - Heartbeat response received (id = 2)
Created 12-04-2016 02:03 AM
I was able to resolve this issue. Now the services are coming up
Created 12-04-2016 03:56 AM
Can you please share your findings, What was the issue and how did you resolve it?
Created 01-20-2020 02:31 AM
can you please share what you did to solve this issue ??