Created 03-14-2019 04:28 PM
I did shutdown all HDP Services and subsequently ambari-agents and ambari-server on a cluster due to some scheduled network maintainance.
When starting the ambari-server and ambari-agents again, the ambari-agent on the headnode (also running ambari-server) did not register with the ambari-server (the server is not getting a request at all). The ambari-agent on the slavenode registers fine.
Ambari version is 2.6.1.0 (recently upgraded according to HDP documentation, did work after that), HDP version 2.6.0.3
Python 2.7.5 / CentOS7
on headnode (ambari running as root):
ambari-agent start produces
Verifying Python version compatibility...
Using python /usr/bin/python
Checking for previously running Ambari Agent...
Checking ambari-common dir...
Starting ambari-agent
Verifying ambari-agent process status...
Ambari Agent successfully started
Agent PID at: /run/ambari-agent/ambari-agent.pid
Agent out at: /var/log/ambari-agent/ambari-agent.out
Agent log at: /var/log/ambari-agent/ambari-agent.log
ambari-agent status
ambari-agent currently not running
ps aux | grep ambari_agent
root 15892 0.0 0.0 238584 17624 pts/4 S 11:29 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/ambari_agent/AmbariAgent.py start
root 15900 0.0 0.0 312740 18292 pts/4 Sl 11:29 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/ambari_agent/main.py start
There is no pid-file for ambari-agent, the directory /run/ambari-agent exists and the permissions (root:root 755) are fine, no stale pid file
cat /var/log/ambari-agent/ambari-agent.log
....
INFO 2019-03-14 13:09:26,515 main.py:145 - loglevel=logging.INFO
INFO 2019-03-14 13:09:26,516 main.py:145 - loglevel=logging.INFO
INFO 2019-03-14 13:09:26,516 main.py:145 - loglevel=logging.INFO
INFO 2019-03-14 13:09:26,517 DataCleaner.py:39 - Data cleanup thread started
INFO 2019-03-14 13:09:26,518 DataCleaner.py:120 - Data cleanup started
INFO 2019-03-14 13:09:26,523 DataCleaner.py:122 - Data cleanup finished
However, the loglevel of ambari-agent should be DEBUG:
cat /etc/ambari-agent/conf/ambari-agent.ini
...
loglevel=DEBUG
...
- How can I increase the log-level of ambari-agent?
- ambari-agent seems to be running, but not creating a PID file.
- according to the ambari-server log, there seems to be no connection attempt from the ambari-agent.
I did reinstall the ambari-agent, but that did not help. Any help appreciated.
Created 03-14-2019 08:46 PM
As we see that the ambari agent log is stuck exactly at : "Data cleanup finished" so i am suspecting that it might be stuck there due to "fuser". Sometimes it gets stuck and in such cases we need to reboot the agent host.
Can you please check if agent is stuck at fuser
# ps -flye | grep fuser | grep D # ps -flye | grep fuser
The "D" state means Uninterruptible Sleep State.
The Command "fuser" is used by ambari-agent. The ambari-agent does not start properly because fuser might get stuck in uninterrupted sleep state.
Similar Issue in HCC article: To resolve from this issue reboot the server. https://community.hortonworks.com/content/supportkb/182955/ambari-agent-loses-heartbeat-and-ambari-a...
Created 03-14-2019 08:46 PM
As we see that the ambari agent log is stuck exactly at : "Data cleanup finished" so i am suspecting that it might be stuck there due to "fuser". Sometimes it gets stuck and in such cases we need to reboot the agent host.
Can you please check if agent is stuck at fuser
# ps -flye | grep fuser | grep D # ps -flye | grep fuser
The "D" state means Uninterruptible Sleep State.
The Command "fuser" is used by ambari-agent. The ambari-agent does not start properly because fuser might get stuck in uninterrupted sleep state.
Similar Issue in HCC article: To resolve from this issue reboot the server. https://community.hortonworks.com/content/supportkb/182955/ambari-agent-loses-heartbeat-and-ambari-a...
Created 03-15-2019 09:13 AM
Great to know that it resolved your issue. It will be great if you can mark this HCC thread as "Answered" by clicking on "Accept" button on the correct answer.
Created 03-15-2019 08:10 AM
Hi, thanks very much that resolved the issue.