Support Questions

Find answers, ask questions, and share your expertise

HeartBeat Lost loss for all services

avatar
Contributor

Hi All,

We are using hdp 2.3. Today morning when i stepped in office. I saw that services are in UNKNOWN state. this is QA cluster sp even after restart and reboot, killing ambari-agent, ambari-server, postgresql restart is not helping me.

Here is the screenshot and logs.

ambari.jpg

Logs are here

======================================================

WARN [ambari-hearbeat-monitor] HeartbeatMonitor:154 - Heartbeat lost from host localhost.localdomain WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component METRICS_MONITOR on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component METRICS_COLLECTOR on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HBASE_MASTER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HBASE_REGIONSERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component PHOENIX_QUERY_SERVER WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component SECONDARY_NAMENODE on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component DATANODE on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component NAMENODE on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HIVE_SERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component MYSQL_SERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HIVE_METASTORE on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component WEBHCAT_SERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component KAFKA_BROKER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HISTORYSERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component SPARK_JOBHISTORYSERVE WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component NODEMANAGER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component APP_TIMELINE_SERVER o WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component RESOURCEMANAGER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component ZEPPELIN_MASTER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component ZOOKEEPER_SERVER on

=======================================================

Kindly suggest.

I am not sure if state changed by ambari-api ? If so, How can I track/check the same.

Thanks in advance.

Harshal

17 REPLIES 17

avatar
Master Mentor

Can you confirm ambari agent is up?

avatar
Contributor

Hi @Artem Ervits ambari agent is up and running.

Also getting above error at restart for each ambari component

On host localhost.localdomain role HIVE_METASTORE in invalid state.
Invalid transition. Invalid event: HOST_SVCCOMP_OP_IN_PROGRESS at UNKNOWN

avatar
Master Mentor

@Harshal Joshi

An Ambari managed cluster should be stopped gracefully just like an oracle database you . A reboot is the equivalent of shutdown abort in Oracle.When you reboot your cluster its advisable to start the components manually in the order Ambari server,HDFS,YARN

Otherwise have a look at this link

avatar
Contributor

Hi @Geoffrey Shelton Okot, I have reboot it after the issue. cluster was already down.

On host localhost.localdomain role HIVE_METASTORE in invalid state.
Invalid transition. Invalid event: HOST_SVCCOMP_OP_IN_PROGRESS at UNKNOWN

also getting above error for all componant

avatar

@Harshal Joshi

How many nodes are in the cluster? Is it a sandbox? Please check if the ambari-agent is indeed coming up. Compare /var/run/ambari-agent/ambari-agent.pid with the process running. Take a look at this article.

avatar
Contributor

Hi @vpoornalingam agent is up and running also matching PID. Cluster is single node cluster

I am also getting for all component

On host localhost.localdomain role HIVE_METASTORE in invalid state.
Invalid transition. Invalid event: HOST_SVCCOMP_OP_IN_PROGRESS at UNKNOWN

avatar
Master Mentor

@Harshal Joshi

For the Host-config-is-in-invalid-state. Please have a look at this post great API's for changing the state of a service component Link

avatar
Super Collaborator

It seems that ambari-server has lost connection with ambari-agent somehow.

Try these steps :

1. Stop ambari-server

2. Stop ambari-agent service on all nodes

3. Start ambari-agent service on all nodes

4. Start ambari-server server

View logs of ambari-server and ambari-agent and see if it throws any error other than component in UNKOWN State.

avatar
Explorer

Hi Harshal,Were you able to fix this?Am facing same issue.