I have a four node cluster (HDP 2.4). Ambari dashboard was showing hearbeat lost for all the services for one of the nodes (Machine1). On investigation, I found that Machine1 was shut down due to some reason. Following are the steps that I have tried. Please note that the ambari-server is installedon Machine2.
1) Restarted the node, and restarted ambari agent several times
2) Restarted ambari metrics collector and zookeeper server manually on this node, but there is no change. What should I try next?
3) Restarted ambari-server, ambari-agent on all the nodes.
Following is the log from the ambari-alerts.log, but I couldn't understand why this error is coming, because I do not think I have start Ambari Monitor or other services manually before starting ambari-agent.
INFO 2016-04-29 14:36:51,936 logger.py:67 - Pid file /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid is empty or does not exist ERROR 2016-04-29 14:36:51,937 script_alert.py:112 - [Alert][ams_metrics_monitor_process] Failed with result CRITICAL: ['Ambari Monitor is NOT running on Machine1']
@Ranjana Soundararajan, I think the original agent is in a strange state. Please try this:
Hope this helps,
@Ranjana Soundararajan I tried these steps several times, and also rebooted the node, but there is no change to the status of this Node in Ambari Dashboard. Therefore, I tried to delete the Node from the cluster and tried to add the node back to the cluster. This time, I started getting host name related issue for which I have opened another thread. Thanks for your input Ranjana
Can you access the Ambari UI on port 8080? If you can, try to stop all services and start all services. Services rely to start in a certain order, so using "stop all" and "start all" in Ambari could remediate your problem.
Hello! It looks like the IP address from the agents have change. Could you please verify the IP shown in the "Hosts" tab, clicking any missing node, then looking down-left under ''Summary" check the IP known by Ambari Server is the same the lost host have currently. If they are different, well... you should put back the one known by Ambari Server.
@ Pradeep Kumar,
Please check this file file is available on path /var/run/ambari-metrics-monitor/ambari-metrics-monitor.pid.
If the file is there then delete the file and kill the process id which is exist inside this file.
check process is running or not if running then also kill the process.
And try to take clean restart of ambari. first stop agent then restart ambar-server and then start again ambari-agent.
After did all the steps try to restart the service. This was worked for me, I faces similar issue before someday ago.