Getting "heartbeat" errors when trying to install CDH via Cloudera Manager

by Cloudera Employee Clint on ‎01-29-2016 01:54 PM - edited on ‎09-27-2016 09:11 AM by Community Manager

Symptoms

 

When trying to install CDH via Cloudera Manager, you may sometimes encounter "heartbeat" errors similar to the following:

 

    Installation failed. Failed to receive heartbeat from agent.

    Ensure that the host's hostname is configured properly.

    Ensure that port 7182 is accessible on the Cloudera Manager server (check firewall rules).

    Ensure that ports 9000 and 9001 are free on the host being added.

    Check agent logs in /var/log/cloudera-scm-agent/ on the host being added (some of the logs can be found in the installation details).

 

 

Applies To

 

Cloudera Manager (All Versions)

 

Cause

 

This type of error can be caused by several factors, but they all come down to the client nodes being able to correctly communicate back to the Cloudera Manager server over the network.

 

Potential root causes of this error:

 

  1. Your client machines do not have their IP addresses configured properly. 
  2. Firewalls and/or iptables could be blocking network traffic.
  3. DNS is misconfigured

 

 

Troubleshooting Steps

 

1. IP Address misconfiguration:

 

Use "ifconfig -a" to see a listing of your network interfaces, your main network interface is probably something like "eth0".  Assure that it has a real IP address, not the loopback address (127.0.0.1) assigned to it.  Run the "hostname -f" command to find out what hostname your local machine is using for itself, then run "nslookup <hostname>" against that hostname (or "dig <hostname>" for more options) to see what IP address it is resolving to. If DNS does not return an IP address for your host, then the configuration will be strictly controlled by /etc/hosts.  Look in that file to see what IP address you are assigning to your host.

 

2. Firewalls or iptables:  either disable them or assure they are allowing the correct ports to pass through.  Follow you company policies to decide which path if best for you.

 

$ sudo chkconfig iptables --list

iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off

 

$ sudo ufw disable

 

3. If "nslookup <hostname>" (where <hostname> is the name of your Cloudera Manager server) does not return the correct IP address, then you have a misconfigured DNS, contact your network admin.

 

When it's all said and done, all machines in your cluster need to be able to resolve each other's hostnames and IP addresses as well as connect to each other on the specific network ports mentioned in the error message.  Assure that /etc/hosts or DNS are configured properly so that your hosts can resolve each other and that each local machine is binding it's hadoop services to a real network IP instead of the loopback address. 

 

If all that is correct and you still cannot connect, check to make sure firewalls or other services are not blocking the traffic.

 

References

Comments
by OlegKhaykin
on ‎09-06-2017 08:29 PM

I have the same problem while installing of CDH-5.12 on a 3-node cluster:

1) PowerEdge - the main computer where cloudera-scm-server is running;

2) hadoop-1 - the 1st node where cloudera-scm-agent is running;

3) hadoop-2 - the 2nd node where cloudera-scm-agent is running; 

 

This is what I see in /var/log/cloudera-scm-agent/cloudera-scm-agent.log on hadoop-1:

 

[06/Sep/2017 22:51:12 +0000] 18772 MainThread agent ERROR Heartbeating to computer.home:7182 failed.
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.12.0-py2.7.egg/cmf/agent.py", line 1401, in _send_heartbeat
self.master_port)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 469, in __init__
self.conn.connect()
File "/usr/lib64/python2.7/httplib.py", line 807, in connect
self.timeout, self.source_address)
File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 111] Connection refused

 

BTW, this is what netstat is reporting:

 

> netstat -a | grep 7182

tcp        0      0 hadoop-1:47144          PowerEdge:7182          ESTABLISHED

 

What have I done wrong?

by sharmaji
on ‎01-08-2018 09:05 AM

I have the same issue and checked everything suggested by Clint. Firewall is disabled, Manager is listening on 8182. hostsa are able to resolve names to correct IP. Not sure what is missing. I am installling  CDH-5.13.1-1.cdh5.13.1.p0.2 and I am trying to use "Single User Mode", could that be a problem ? 

 

 

[08/Jan/2018 09:41:16 +0000] 4959 MainThread agent ERROR Heartbeating to dsib2041:None failed.
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.13.1-py2.7.egg/cmf/agent.py", line 1412, in _send_heartbeat
self.master_port)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 469, in __init__
self.conn.connect()
File "/usr/lib64/python2.7/httplib.py", line 807, in connect
self.timeout, self.source_address)
File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 111] Connection refused
[08/Jan/2018 09:41:16 +0000] 4959 MainThread heartbeat_tracker INFO HB stats (seconds): num:1 LIFE_MIN:0.00 min:0.00 mean:0.00 max:0.00 LIFE_MAX:0.00
[08/Jan/2018 09:42:16 +0000] 4959 MainThread agent ERROR Heartbeating to dsib2041:None failed.
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.13.1-py2.7.egg/cmf/agent.py", line 1412, in _send_heartbeat
self.master_port)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 469, in __init__
self.conn.connect()
File "/usr/lib64/python2.7/httplib.py", line 807, in connect
self.timeout, self.source_address)
File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 111] Connection refused

 

by sharmaji
on ‎01-08-2018 09:07 AM

Correction : Port 7182, not 8182

 

[root@dsib2041 ~]# netstat -l

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:mysql 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:7180 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:7182 0.0.0.0:* LISTEN

by Shelton
on ‎01-19-2018 08:13 PM

@OlegKhaykin @sharmaji

 

You shouldn't also forget setting up the SeLinux, passwordless connection and NTP! I have seen case where these have affected setup!

by Paul_
on ‎03-20-2018 09:55 AM

I'm having the same issue's and need some help to get through this....

 

On a RHEL7.3 installed in a VM (VirtualBox) i'm trying to install a single node cluster using cloudera manager (Cloudera Express 5.14.1). I get stuck when adding a single agent node (on localhost that is).

 

Error message:

 

Installation failed. Failed to receive heartbeat from agent.

  • Ensure that the host's hostname is configured properly.
  • Ensure that port 7182 is accessible on the Cloudera Manager Server (check firewall rules).
  • Ensure that ports 9000 and 9001 are not in use on the host being added.
  • Check agent logs in /var/log/cloudera-scm-agent/ on the host being added. (Some of the logs can be found in the installation details).
  • If Use TLS Encryption for Agents is enabled in Cloudera Manager (Administration -> Settings -> Security), ensure that /etc/cloudera-scm-agent/config.ini has use_tls=1 on the host being added. Restart the corresponding agent and click the Retry link here.

 

I've checked, fixed, rechecked:

- no firewall is enabled

- selinux is disabled

- ipv6 is disabled

- ntp is setup

 

I have stopped, wiped, restarted the installation many times now but i always get stuck at this point.

 

I have extended my hosts files with:

 

10.0.2.15 my-host.local my-host

 

But i have found that when i try to aff my-host in the wizard it first seems to setup the agent at my-host.local @ 10.0.2.15, but then when i hit the error, if i open: http://localhost:7180/cmf/hardware/hosts it always shows up as localhost4.localdomain4. Then also the most odd thing is that in the hosts overview there is a Last Heartbeat column that actually shows that the heartbeat is working: it's like 4.26s, then when reloading it's 1,5s, etc....

 

I have checked both the cloudera-scm-server and cloudera-scm-agent logs but did not find any usefull things.

 

Anyone please; how do i setup a single node (test) cluster?

 

Notice i can't work with the quickstart because of Centos....

 

Paul

 

by sharmaji
on ‎03-20-2018 12:40 PM

I could not figure  out what the issue was so I use this workaround  and add the host as "Currently Managed Hosts" instead of selcting a new one.

Here are the steps are use to add a new host ( some of them may not be applicabe to your setup)

 

Disable Firewall on node:    

systemctl disable firewalld.service

 

Disable SELinux Set SELINUX=disabled in the /etc/selinux/config file and reboot

After reboot, run getenforce command to check that SELinux is disbaled

  Update repository

yum-config-manager --add-repo https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/cloudera-manager.repo

 

Install  Java

$ sudo yum install –y oracle-j2sdk1.7

export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera

put it in .bashrc as well

echo "export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera" >>/root/.bashrc

  Install cloudera daemons

sudo yum install -y cloudera-manager-agent cloudera-manager-daemons

 

On agent edit  /etc/cloudera-scm-agent/config.ini to point to Cloudera manager (if nedded)

# Hostname of the CM server.

server_host=nodexxxx

 

Start Agent

# service cloudera-scm-agent start

# chkconfig cloudera-scm-agent on

 

Login to Cloudera manager and while adding host, insteading of selecting new host from the list, look for "Currently Managed Hosts(1)" link on the top of the screen. Click on the link and select the host from list.

 

Rest of the installation should go fine. Good luck

 

 

 

Contributors
Disclaimer: The information contained in this article was generated by third-parties and not by Cloudera or it's personnel. Cloudera cannot guarantee its accuracy or efficacy. Cloudera disclaims all warranties of any kind and users of this information assume all risk associated with it and with following the advice or directions contained herein. By visiting this page, you agree to be bound by the Terms and Conditions of Site Usage , including all disclaimers and limitations contained therein.