Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HeartBeat Lost loss for all services

avatar
Contributor

Hi All,

We are using hdp 2.3. Today morning when i stepped in office. I saw that services are in UNKNOWN state. this is QA cluster sp even after restart and reboot, killing ambari-agent, ambari-server, postgresql restart is not helping me.

Here is the screenshot and logs.

ambari.jpg

Logs are here

======================================================

WARN [ambari-hearbeat-monitor] HeartbeatMonitor:154 - Heartbeat lost from host localhost.localdomain WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component METRICS_MONITOR on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component METRICS_COLLECTOR on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HBASE_MASTER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HBASE_REGIONSERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component PHOENIX_QUERY_SERVER WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component SECONDARY_NAMENODE on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component DATANODE on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component NAMENODE on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HIVE_SERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component MYSQL_SERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HIVE_METASTORE on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component WEBHCAT_SERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component KAFKA_BROKER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HISTORYSERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component SPARK_JOBHISTORYSERVE WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component NODEMANAGER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component APP_TIMELINE_SERVER o WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component RESOURCEMANAGER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component ZEPPELIN_MASTER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component ZOOKEEPER_SERVER on

=======================================================

Kindly suggest.

I am not sure if state changed by ambari-api ? If so, How can I track/check the same.

Thanks in advance.

Harshal

17 REPLIES 17

avatar
Master Mentor

Can you confirm ambari agent is up?

avatar
Contributor

Hi @Artem Ervits ambari agent is up and running.

Also getting above error at restart for each ambari component

On host localhost.localdomain role HIVE_METASTORE in invalid state.
Invalid transition. Invalid event: HOST_SVCCOMP_OP_IN_PROGRESS at UNKNOWN

avatar
Master Mentor

@Harshal Joshi

An Ambari managed cluster should be stopped gracefully just like an oracle database you . A reboot is the equivalent of shutdown abort in Oracle.When you reboot your cluster its advisable to start the components manually in the order Ambari server,HDFS,YARN

Otherwise have a look at this link

avatar
Contributor

Hi @Geoffrey Shelton Okot, I have reboot it after the issue. cluster was already down.

On host localhost.localdomain role HIVE_METASTORE in invalid state.
Invalid transition. Invalid event: HOST_SVCCOMP_OP_IN_PROGRESS at UNKNOWN

also getting above error for all componant

avatar

@Harshal Joshi

How many nodes are in the cluster? Is it a sandbox? Please check if the ambari-agent is indeed coming up. Compare /var/run/ambari-agent/ambari-agent.pid with the process running. Take a look at this article.

avatar
Contributor

Hi @vpoornalingam agent is up and running also matching PID. Cluster is single node cluster

I am also getting for all component

On host localhost.localdomain role HIVE_METASTORE in invalid state.
Invalid transition. Invalid event: HOST_SVCCOMP_OP_IN_PROGRESS at UNKNOWN

avatar
Master Mentor

@Harshal Joshi

For the Host-config-is-in-invalid-state. Please have a look at this post great API's for changing the state of a service component Link

avatar
Super Collaborator

It seems that ambari-server has lost connection with ambari-agent somehow.

Try these steps :

1. Stop ambari-server

2. Stop ambari-agent service on all nodes

3. Start ambari-agent service on all nodes

4. Start ambari-server server

View logs of ambari-server and ambari-agent and see if it throws any error other than component in UNKOWN State.

avatar
Explorer

Hi Harshal,Were you able to fix this?Am facing same issue.