I am trying to install Cloudrea Manager Standard edition and CHD4 with parcels. This is being installed on three Dell machines running Ubuntu 12.04.2 LTS 64 bit. I am receiving an error on all three machines:
Ensure that the host's hostname is configured properly.
Ensure that port 7182 is accessible on the Cloudera Manager server (check firewall rules).
Ensure that ports 9000 and 9001 are free on the host being added.
Check agent logs in /var/log/cloudera-scm-agent/ on the host being added (some of the logs can be found in the installation details).
I checked that the hostname is configured.
I checked that Ports 7182, 9000 and 9001 are free (I am guessing that Cloudera is using 9000 & 9001 for python because these ports are in use after the install fails but not before the install).
sudo netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:6000 0.0.0.0:* LISTEN 1104/X
tcp 0 0 127.0.0.1:53 0.0.0.0:* LISTEN 1123/dnsmasq
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 655/sshd
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 792/cupsd
tcp 0 0 127.0.0.1:6010 0.0.0.0:* LISTEN 29403/0
tcp6 0 0 :::6000 :::* LISTEN 1104/X
tcp6 0 0 :::22 :::* LISTEN 655/sshd
tcp6 0 0 ::1:631 :::* LISTEN 792/cupsd
tcp6 0 0 ::1:6010 :::* LISTEN 29403/0
udp 0 0 127.0.0.1:53 0.0.0.0:* 1123/dnsmasq
udp 0 0 0.0.0.0:68 0.0.0.0:* 1073/dhclient
udp 0 0 0.0.0.0:36051 0.0.0.0:* 790/avahi-daemon: r
udp 0 0 0.0.0.0:5353 0.0.0.0:* 790/avahi-daemon: r
udp6 0 0 :::50807 :::* 790/avahi-daemon: r
udp6 0 0 :::5353 :::* 790/avahi-daemon: r
And I checked the firewalls and they are open
sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Lastly I checked the log files in /var/log/cloudera-scm-agent/ and they do not show any errors and there was only one warning, that the default socket timeout was set to 30.
Can anyone point me to what the possible problem is? We are looking at using Hadoop in one of our solutions and are trying to evaluate it before purchasing the Enterprise version. I cannot use a cloud version because of data restrictions put on my by the data vendor and client so I need to have an internal sandbox to get an idea of what we need to develop and what we will need to support. Thanks!
Created 08-21-2013 01:46 PM
@enelso: Your hostname needs to be tied to an actual IP address on your local network which can send/receive traffic between all the hosts. The address you have associated your hostname with is the loopback address, which cannot route actual network traffic off the host.
Use "ifconfig -a" to see a listing of your network interfaces and choose one that has an actual IP address.
Created on 12-19-2013 08:52 AM - edited 12-19-2013 08:52 AM
Hello,
I apologize for the delay. I think there may be a couple of things going on. For starters, you should add your own hostname and IP address to the /etc/hosts file on each machine. In other words, both Host1 and Host2 entries should be in the /etc/hosts file on both machines. Also, have you checked to see if iptables is running? That is a firewall app that can stop traffic between nodes. To identify if iptables is running and disable it, do this (as root):
$ sudo chkconfig iptables --list iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off
If you see iptables as "on" for any of those values (especially #5), than it's probably getting in your way, you should disable it unless you've got a company policy requiring it be enabled. To disable it, run this command:
sudo chkconfig iptables off
Created 06-11-2017 05:58 PM
Clink, I got the same error, and I changed my /etc/hosts file as: 52.60.239.41 ip-172-31-12-19.ca-central-1.compute.internal node1
also, I restart agent on each hosts, 4 of them, I still get the same error.
please help.
Robin
Created 01-17-2018 08:20 AM
I ran into similar problems and finally figured out what was causing the issue.
All my hosts / hostname etc. settings were correct, ports available etc., but the thing that caused the heartbeat to fail.....was that NTPD service has to be running on all hosts in order to "synchronize" the heartbeat.
No sync == no heartbeat.
Can't believe that did not occur to me sooner. Hope this helps someone.
(It may be helpful to add this to the list of possible causes of failing installation, that is shown upon failure)
Created 01-17-2018 08:30 AM
Thank you for sharing your solution.
Just so there is no confusion, though, the agent does not require ntpd to heartbeat. By default, however, the agent does run healthchecks that test ntpd or chronyd to ensure that offset is within expected parameters.
While failure of the healthchecks will result in a host being shown in bad health, it is not a functional characteristic of the health check to prevent heartbeats.
If adding ntpd was the only action that allowed the agent to heartbeat, then it could be there was some problem where the agent was not able to progress with the heartbeat due to an unusual condition. If you or anyone hits a problem like this where the agent cannot heartbeat, let us know again and we will take a closer look.
My reason for explaining this was to make sure everyone was aware that NTP is not required for heartbeating. The root cause may have been related to NTP, but it was not an intended feature limitation.
Thanks again, for sharing.
Cheers.
Ben
Created 12-16-2013 03:40 PM
Hello!
I bealive I have similar (or identical) problem but I don't fully understand explanation. I'm installing cluster on two virtual machines with Ubuntu 12.04 on them. Both have two network interfaces eth0 and eth1. eth0 is connection with internet through NAT and eth1 is virtual network between them.
Host 1 name is Hadoop1.local and its IP (eth1) is: 192.168.56.101.
Host 2 name is Hadoop2.local and its IP (eth1) is: 192.168.56.102
I start Cloudera Manager instalator on Hadoop1. In step "" I type both IP addresses (.101 & .102) (Is it OK or should I only start installation on second host- Hadoop2). During "Cluster Installation" step on Hadoop1 I get error:
"Installation failed. Failed to receive heart beat from agent".
I check ports 7182, 9000, 9001 and those are OK (before I start installation those are empty, after installation I have python on them).
In /var/log/cloudera-scm-agent I see no errors.
So I assume there is something wrong with my hosts names.
My /etc/host looks like this (Hadoop1):
127.0.0.1 localhost loopback
192.168.56.102 Hadoop2.local
And second /etc/host like this (Hadoop2):
127.0.0.1 localhost loopback
192.168.56.101 Hadoop1.local
Any help would be appreciate!
Regards
Andrzej
Created 12-19-2013 04:56 AM
Hello!
Any idea how to solve above problem? Any help would be appreciate!
Regards
Andrzej
Created on 12-19-2013 08:52 AM - edited 12-19-2013 08:52 AM
Hello,
I apologize for the delay. I think there may be a couple of things going on. For starters, you should add your own hostname and IP address to the /etc/hosts file on each machine. In other words, both Host1 and Host2 entries should be in the /etc/hosts file on both machines. Also, have you checked to see if iptables is running? That is a firewall app that can stop traffic between nodes. To identify if iptables is running and disable it, do this (as root):
$ sudo chkconfig iptables --list iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off
If you see iptables as "on" for any of those values (especially #5), than it's probably getting in your way, you should disable it unless you've got a company policy requiring it be enabled. To disable it, run this command:
sudo chkconfig iptables off
Created 12-22-2013 02:13 PM
Hi,
Thank You for Your response. I think I had something wrong in my /etc/hosts. Also on one node I had ufw enabled. Following command do the work:
sudo ufw disable
Thank You for Your support!
Regards
Andrzej
Created 06-25-2014 04:47 PM
I am facing the same exact problem not sure if it is related or not but seems very very close.
I am trying to install CDH5 on a single node cluster. .. so everything is supposed to be running on the same host ( cloudera manager, and all other hadoop and spark services), more of a sandbox than anything else,
I am running ubuntu 64 bit precise OS.
I am using cloudera manager admin to install and and cluster installation step it fails:
Installation failed. Failed to receive heartbeat from agent.
Thanks,
Mukul
Created 10-25-2014 02:28 PM
@clint, I tried picking the IP from ifconfig -a command and added that to hosts file as below
<IP><FQDN (at that one came in ifconfig -a)><shortname>
I used hostname and hostname -f to find the shortname and FQDN. I still get the following error during installation
Installation failed. Failed to receive heartbeat from agent.
I have enabled all TCP ports and ICMP ports. I am not sure what am I missing. Could you please help should I be checking something else?
Created 08-16-2015 11:19 PM
Hi,
I'm exactl facing the same issue:
can some one please help me to resolve this issue, I have following information in my /etc/hosts file
I can see successful installation message in n1 and above failure message in rest of two nodes.
Please help me