We are in the process of moving all of our cluster services from one set of hosts to a completely new set of hosts. We have successfully moved Ambari, HDFS (Namenode, JournalNodes, Datanodes), Yarn, ZK, but after restarting Ambari the hosts are all in an "UNKNOWN" state. Looking in the Ambari server logs we see the following:
2020-02-19 21:25:43,771 WARN [ambari-hearbeat-monitor] ClusterTopologyImpl:175 - HostGroup host_group_8 not found, when checking for hosts for component NAMENODE
2020-02-19 21:25:43,771 WARN [ambari-hearbeat-monitor] ClusterTopologyImpl:175 - HostGroup host_group_7 not found, when checking for hosts for component NAMENODE
2020-02-19 21:25:43,771 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:133 - Exception received
java.lang.RuntimeException: Failed to construct cluster topology while replaying request: org.apache.ambari.server.topology.InvalidTopologyException: NAMENODE HA requires at least 2 hosts running NAMENODE but there are: 0 Hosts: []
at org.apache.ambari.server.topology.PersistedStateImpl.getAllRequests(PersistedStateImpl.java:224)
The Ambari agents on all of the hosts show similar errors. HDFS is up and operational with 2 Namenodes in an HA config. host_group_7 and host_group_8 refer to host groups that were associated with the original Namenodes from when the cluster was originally installed via blueprint. This was a year ago and the cluster is very different now. Looking in the Ambari DB, I see a table named TOPOLOGY_HOST_INFO that contains old hosts, but none of the new hosts that we have moved to. Anyone have any idea what's going on here? Why are the ambari agents validating topology anyway?