Created 03-10-2016 08:09 AM
Hi All,
We are using hdp 2.3. Today morning when i stepped in office. I saw that services are in UNKNOWN state. this is QA cluster sp even after restart and reboot, killing ambari-agent, ambari-server, postgresql restart is not helping me.
Here is the screenshot and logs.
Logs are here
======================================================
WARN [ambari-hearbeat-monitor] HeartbeatMonitor:154 - Heartbeat lost from host localhost.localdomain WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component METRICS_MONITOR on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component METRICS_COLLECTOR on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HBASE_MASTER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HBASE_REGIONSERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component PHOENIX_QUERY_SERVER WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component SECONDARY_NAMENODE on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component DATANODE on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component NAMENODE on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HIVE_SERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component MYSQL_SERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HIVE_METASTORE on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component WEBHCAT_SERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component KAFKA_BROKER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component HISTORYSERVER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component SPARK_JOBHISTORYSERVE WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component NODEMANAGER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component APP_TIMELINE_SERVER o WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component RESOURCEMANAGER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component ZEPPELIN_MASTER on WARN [ambari-hearbeat-monitor] HeartbeatMonitor:169 - Setting component state to UNKNOWN for component ZOOKEEPER_SERVER on
=======================================================
Kindly suggest.
I am not sure if state changed by ambari-api ? If so, How can I track/check the same.
Thanks in advance.
Harshal
Created 05-12-2016 08:45 AM
Any luck..Am facing same issue..!!
Created 05-12-2016 08:45 AM
Seems IT changed domain name, updated /etc/host and resolv.conf to reflect old fqdn name.But restart cluster was failing with
Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: nn/mxspdh10.amdocs.com@MXSPDH10.KERBEROS.COM; Host Details : local host is: "mxspdh10.mx.amdocs.com/135.208.66.57"; destination host is: "mxspdh10.amdocs.com":8020;
Local host is .mx.amdocs.com and expected was amdocs.com.
After resolv.conf chnage ,tried rebooting cluster ,but post that facing ambari-agent heartbeat issues.Same error as shared in the forum.Suggession pls.
Created 05-12-2016 09:42 AM
Your cluster is kerberized?
You need create new keytab for all service and reset all option on ambari if you change domain..
Created 04-02-2020 01:52 AM
You can try logging into the admin user and restart datanodes from the actions bar in Dashboard.
That worked for me. May work for you too.
Created 05-12-2016 09:38 AM
I had the same error time ago.
First, verify /etc/hosts, then verify the ambari-node able to connect to all nodes, and that all nodes able to connect to ambari-node (like ping or ssh connect). Then I had resolved by resetting all agents (that I had stopped before):
ambari-agent reset <Ambari-server-hostname>
At next restart agents have started to successfully transmit information. I hope it can help you
Created 05-12-2016 10:19 AM
its ambari 1.6 ,reset is post ambari2.1
I checked few things:
I see below rows in ambari-postgres DB
ambari=# select host_name from ambari.hosts; host_name --------------------- mxspdh16.amdocs.com mxspdh10.amdocs.com mxspdh18.amdocs.com mxspdh17.amdocs.com mxspdh10.mx.amdocs.com (5 rows)
ambari=# select * from ambari.hoststate ; agent_version | available_mem | current_state | health_status | host_name | time_in_state | maintenance_state
---------------------+---------------+---------------+----------------------------------------------+---------------------+---------------+------------------ - {"version":"1.7.0"} | 31623084 | INIT | {"healthStatus":"HEALTHY","healthReport":""} | mxspdh10.mx.amdocs.com | 1463041598424 | {"version":"1.7.0"} | 31792512 | INIT | {"healthStatus":"UNKNOWN","healthReport":""} | mxspdh10.amdocs.com | 1462368178266 | {"4":"OFF"} {"version":"1.7.0"} | 28241364 | INIT | {"healthStatus":"HEALTHY","healthReport":""} | mxspdh16.amdocs.com | 1463040523426 | {"version":"1.7.0"} | 28890788 | INIT | {"healthStatus":"HEALTHY","healthReport":""} | mxspdh18.amdocs.com | 1463040527465 | {"version":"1.7.0"} | 29281736 | INIT | {"healthStatus":"HEALTHY","healthReport":""} | mxspdh17.amdocs.com | 1463040528044 | (5 rows) ambari=# delete from ambari.hoststate where host_name='mxspdh10.mx.amdocs.com'; DELETE 1
I delete both rows. But on restart ambari these two rows again gets populated. Please see why we are getting mxspdh10.mx.amdocs.com ???
Created 04-30-2017 05:47 PM
service iptables stop
in ambari-agent file hostname entry should be
hostname = ambariservernodehost
ambariservernodehost should be present in /etc/hosts file
check logs of ambari agent. If even there is problem then please reply me.
Created 06-05-2018 01:39 PM
Sometimes after upgrade you need to check whether your ambari-agent and ambari-server versions are same or at least compatible.