Created 03-10-2016 08:09 AM
Hi All,
We are using hdp 2.3. Today morning when i stepped in office. I saw that services are in UNKNOWN state. this is QA cluster sp even after restart and reboot, killing ambari-agent, ambari-server, postgresql restart is not helping me.
Here is the screenshot and logs.
Logs are here
======================================================
WARN [ambari-hearbeat-monitor] HeartbeatMonitor:154 - Heartbeat lost from host localhost.localdomain WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component METRICS_MONITOR on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component METRICS_COLLECTOR on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HBASE_MASTER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HBASE_REGIONSERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component PHOENIX_QUERY_SERVER WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component SECONDARY_NAMENODE on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component DATANODE on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component NAMENODE on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HIVE_SERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component MYSQL_SERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HIVE_METASTORE on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component WEBHCAT_SERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component KAFKA_BROKER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HISTORYSERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component SPARK_JOBHISTORYSERVE WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component NODEMANAGER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component APP_TIMELINE_SERVER o WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component RESOURCEMANAGER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component ZEPPELIN_MASTER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component ZOOKEEPER_SERVER on
=======================================================
Kindly suggest.
I am not sure if state changed by ambari-api ? If so, How can I track/check the same.
Thanks in advance.
Harshal
Created 03-10-2016 09:46 AM
Can you confirm ambari agent is up?
Created 03-11-2016 05:20 AM
Hi @Artem Ervits ambari agent is up and running.
Also getting above error at restart for each ambari component
On host localhost.localdomain role HIVE_METASTORE in invalid state. Invalid transition. Invalid event: HOST_SVCCOMP_OP_IN_PROGRESS at UNKNOWN
Created 03-10-2016 09:50 AM
An Ambari managed cluster should be stopped gracefully just like an oracle database you . A reboot is the equivalent of shutdown abort in Oracle.When you reboot your cluster its advisable to start the components manually in the order Ambari server,HDFS,YARN
Otherwise have a look at this link
Created 03-11-2016 05:25 AM
Hi @Geoffrey Shelton Okot, I have reboot it after the issue. cluster was already down.
On host localhost.localdomain role HIVE_METASTORE in invalid state. Invalid transition. Invalid event: HOST_SVCCOMP_OP_IN_PROGRESS at UNKNOWN
also getting above error for all componant
Created 03-10-2016 04:06 PM
How many nodes are in the cluster? Is it a sandbox? Please check if the ambari-agent is indeed coming up. Compare /var/run/ambari-agent/ambari-agent.pid with the process running. Take a look at this article.
Created 03-11-2016 05:23 AM
Hi @vpoornalingam agent is up and running also matching PID. Cluster is single node cluster
I am also getting for all component
On host localhost.localdomain role HIVE_METASTORE in invalid state. Invalid transition. Invalid event: HOST_SVCCOMP_OP_IN_PROGRESS at UNKNOWN
Created 03-11-2016 08:21 AM
For the Host-config-is-in-invalid-state. Please have a look at this post great API's for changing the state of a service component Link
Created 05-12-2016 08:24 AM
It seems that ambari-server has lost connection with ambari-agent somehow.
Try these steps :
1. Stop ambari-server
2. Stop ambari-agent service on all nodes
3. Start ambari-agent service on all nodes
4. Start ambari-server server
View logs of ambari-server and ambari-agent and see if it throws any error other than component in UNKOWN State.
Created 05-12-2016 08:28 AM
Hi Harshal,Were you able to fix this?Am facing same issue.