I have a small cluster where CDH was installed a month back. Due to some changes, I started experiencing issues with hdfs and hbase, so I decided to cleanup and re-install everything and now I am seeing issues with agent installations, which is failing with error "Failed to receive heartbeat from agent.".
I have checked to make sure nodes have proper connectivity and there is no firewall, etc. I also see TCP connections established between server and agent at port 7182 and logs seem to indicate that the agent is healthy, but for some reason the installer fails with the error message.
Is this an issue with latest CDH?
I tried to reinstall CDH by cleaning all files/directories several times with no luck. I also tried to go back a older version of CDH, but the agent is still being installed with latest version.
bash -c /tmp/scm_prepare_node.B5Mo3Chq/scm_prepare_node.sh --server_version 5.9.1 --server_build 8 --packages
Appreciate any help in resolving this.
It seems the issue is related to mulitple hostnames associated with each host. On my setup, I have given more than one hostnames to each host in /etc/hosts. For example, first machine has hostnames "mc1" and "zk-1". I was able to add the hosts successfully when I kept only one hostname for each server.
It looks like a bug in CDH as its a common practice to associate multiple hostnames to hosts.