Getting "heartbeat" errors when trying to install CDH via Cloudera Manager

by Community Manager on ‎01-29-2016 01:54 PM - edited on ‎09-27-2016 09:11 AM by Community Manager

Symptoms

 

When trying to install CDH via Cloudera Manager, you may sometimes encounter "heartbeat" errors similar to the following:

 

    Installation failed. Failed to receive heartbeat from agent.

    Ensure that the host's hostname is configured properly.

    Ensure that port 7182 is accessible on the Cloudera Manager server (check firewall rules).

    Ensure that ports 9000 and 9001 are free on the host being added.

    Check agent logs in /var/log/cloudera-scm-agent/ on the host being added (some of the logs can be found in the installation details).

 

 

Applies To

 

Cloudera Manager (All Versions)

 

Cause

 

This type of error can be caused by several factors, but they all come down to the client nodes being able to correctly communicate back to the Cloudera Manager server over the network.

 

Potential root causes of this error:

 

  1. Your client machines do not have their IP addresses configured properly. 
  2. Firewalls and/or iptables could be blocking network traffic.
  3. DNS is misconfigured

 

 

Troubleshooting Steps

 

1. IP Address misconfiguration:

 

Use "ifconfig -a" to see a listing of your network interfaces, your main network interface is probably something like "eth0".  Assure that it has a real IP address, not the loopback address (127.0.0.1) assigned to it.  Run the "hostname -f" command to find out what hostname your local machine is using for itself, then run "nslookup <hostname>" against that hostname (or "dig <hostname>" for more options) to see what IP address it is resolving to. If DNS does not return an IP address for your host, then the configuration will be strictly controlled by /etc/hosts.  Look in that file to see what IP address you are assigning to your host.

 

2. Firewalls or iptables:  either disable them or assure they are allowing the correct ports to pass through.  Follow you company policies to decide which path if best for you.

 

$ sudo chkconfig iptables --list

iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off

 

$ sudo ufw disable

 

3. If "nslookup <hostname>" (where <hostname> is the name of your Cloudera Manager server) does not return the correct IP address, then you have a misconfigured DNS, contact your network admin.

 

When it's all said and done, all machines in your cluster need to be able to resolve each other's hostnames and IP addresses as well as connect to each other on the specific network ports mentioned in the error message.  Assure that /etc/hosts or DNS are configured properly so that your hosts can resolve each other and that each local machine is binding it's hadoop services to a real network IP instead of the loopback address. 

 

If all that is correct and you still cannot connect, check to make sure firewalls or other services are not blocking the traffic.

 

References

Comments
by OlegKhaykin
on ‎09-06-2017 08:29 PM

I have the same problem while installing of CDH-5.12 on a 3-node cluster:

1) PowerEdge - the main computer where cloudera-scm-server is running;

2) hadoop-1 - the 1st node where cloudera-scm-agent is running;

3) hadoop-2 - the 2nd node where cloudera-scm-agent is running; 

 

This is what I see in /var/log/cloudera-scm-agent/cloudera-scm-agent.log on hadoop-1:

 

[06/Sep/2017 22:51:12 +0000] 18772 MainThread agent ERROR Heartbeating to computer.home:7182 failed.
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.12.0-py2.7.egg/cmf/agent.py", line 1401, in _send_heartbeat
self.master_port)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 469, in __init__
self.conn.connect()
File "/usr/lib64/python2.7/httplib.py", line 807, in connect
self.timeout, self.source_address)
File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 111] Connection refused

 

BTW, this is what netstat is reporting:

 

> netstat -a | grep 7182

tcp        0      0 hadoop-1:47144          PowerEdge:7182          ESTABLISHED

 

What have I done wrong?

Contributors
Disclaimer: The information contained in this article was generated by third-parties and not by Cloudera or it's personnel. Cloudera cannot guarantee its accuracy or efficacy. Cloudera disclaims all warranties of any kind and users of this information assume all risk associated with it and with following the advice or directions contained herein. By visiting this page, you agree to be bound by the Terms and Conditions of Site Usage , including all disclaimers and limitations contained therein.