Created on 01-29-201601:54 PM - edited on 09-27-201609:11 AM by cjervis
When trying to install CDH via Cloudera Manager, you may sometimes encounter "heartbeat" errors similar to the following:
Installation failed. Failed to receive heartbeat from agent.
Ensure that the host's hostname is configured properly.
Ensure that port 7182 is accessible on the Cloudera Manager server (check firewall rules).
Ensure that ports 9000 and 9001 are free on the host being added.
Check agent logs in /var/log/cloudera-scm-agent/ on the host being added (some of the logs can be found in the installation details).
Cloudera Manager (All Versions)
This type of error can be caused by several factors, but they all come down to the client nodes being able to correctly communicate back to the Cloudera Manager server over the network.
Potential root causes of this error:
Your client machines do not have their IP addresses configured properly.
Firewalls and/or iptables could be blocking network traffic.
DNS is misconfigured
1. IP Address misconfiguration:
Use "ifconfig -a" to see a listing of your network interfaces, your main network interface is probably something like "eth0". Assure that it has a real IP address, not the loopback address (127.0.0.1) assigned to it. Run the "hostname -f" command to find out what hostname your local machine is using for itself, then run "nslookup <hostname>" against that hostname (or "dig <hostname>" for more options) to see what IP address it is resolving to. If DNS does not return an IP address for your host, then the configuration will be strictly controlled by /etc/hosts. Look in that file to see what IP address you are assigning to your host.
2. Firewalls or iptables: either disable them or assure they are allowing the correct ports to pass through. Follow you company policies to decide which path if best for you.
3. If "nslookup <hostname>" (where <hostname> is the name of your Cloudera Manager server) does not return the correct IP address, then you have a misconfigured DNS, contact your network admin.
When it's all said and done, all machines in your cluster need to be able to resolve each other's hostnames and IP addresses as well as connect to each other on the specific network ports mentioned in the error message. Assure that /etc/hosts or DNS are configured properly so that your hosts can resolve each other and that each local machine is binding it's hadoop services to a real network IP instead of the loopback address.
If all that is correct and you still cannot connect, check to make sure firewalls or other services are not blocking the traffic.