As part of the HDP upgrade Ambari tries to back up name node metadata under location /tmp/upgrades/<HDP_version> however if you don't have enough room on mount on which /tmp exists it silently proceeds after filling up the disk to 100% with partial metadata backup which ideally is a bug and could cause issues, fortunately this didn't impact any upgrade process on name node as we have backed up as part of documentation. However after name node starts with new version and other services are running as expected, agent starts reporting incorrect info about host services as its unable to read process id's or get status of running services. Logs shows just timeout message as, unfortunately as cluster was in maintenance mode we didn't even realize we ran out of space on root mount.
Fix to get around the issue with out downtime of any service was below:
a) Clear up some room on root mount by moving /tmp/upgrades/<directories> to larger disk. (This is anyways not helpful as its partial backup but still best to move for time being)
b) Avoid /tmp in future on smaller disk mounts.
c) manually echo pid's associated with host services into respective files under /var/run/hadoop/hdfs/<file>.pid
d) Restart Ambari agent or give some time to start pushing the service status.