08-29-2013 11:11 AM
I ran into an issue when installing the CDH cluster. I'm trying to install on a single machine. I selected the machine, parcels and am trying to do it as root. At the end of the installation I receiving the following error messages:
"Installation failed. Failed to receive heartbeat from agent.
Ensure that the host's hostname is configured properly.
Ensure that port 7182 is accessible on the Cloudera Manager server (check firewall rules)."
I've found this error in some other forums and attempted to make the changes suggested in those places.
These are the contents of my hosts file (I've also tried it with the localhosts line commented out).
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
This is my netstat for the port, I assume it's just Cloudera Manager listening. I did specifically open the port on the firewall.
netstat -ntlp | grep :7182
tcp 0 0 0.0.0.0:7182 0.0.0.0:* LISTEN 1861/java
Here are the contents of cloudera-scm-agent.out:
[29/Aug/2013 10:24:31 +0000] 2867 MainThread agent INFO SCM Agent Version: 4.6.3
[29/Aug/2013 10:24:31 +0000] 2867 MainThread agent ERROR Could not determine hostname or ip address; proceeding.
Traceback (most recent call last):
File "/usr/lib/cmf/agent/src/cmf/agent.py", line 1573, in parse_arguments
ip_address = socket.gethostbyname(fqdn)
gaierror: [Errno -5] No address associated with hostname
usage: agent.py [-h] [--agent_dir AGENT_DIR]
[--agent_httpd_port AGENT_HTTPD_PORT] --package_dir
PACKAGE_DIR [--parcel_dir PARCEL_DIR]
[--standalone STANDALONE] [--master MASTER]
[--environment ENVIRONMENT] [--host_id HOST_ID]
[--disable_supervisord_events] --hostname HOSTNAME
--ip_address IP_ADDRESS [--use_tls]
[--client_keypw_file CLIENT_KEYPW_FILE] [--logfile LOGFILE]
[--logdir LOGDIR] [--optional_token] [--clear_agent_dir]
agent.py: error: argument --hostname is required
[29/Aug/2013 10:24:31 +0000] 2867 Dummy-1 agent INFO Stopping agent...
I also have the cloudera-scm-agent.log if that would be helpful. The only errors that I see in there are:
[28/Aug/2013 11:58:13 +0000] 25640 MainThread agent ERROR Failed to connect to newly launched supervisor. Agent will exit
[28/Aug/2013 11:52:44 +0000] 24811 MainThread agent INFO Trying to connect to newly launched supervisor (Attempt 4)
[28/Aug/2013 11:52:44 +0000] 24811 MainThread agent ERROR Failed! trying again in 1 second(s)
Any help would be much appreciated.
08-29-2013 11:21 AM
08-29-2013 11:23 AM
root@Accumulo3:/var/log/cloudera-scm-agent# hostname -f
hostname: Name or service not known
root@Accumulo3:/var/log/cloudera-scm-agent# python -c 'import socket; print socket.getfqdn(), socket.gethostbyname(socket.getfqdn())'
Traceback (most recent call last):
File "<string>", line 1, in <module>
socket.gaierror: [Errno -5] No address associated with hostname
08-29-2013 11:25 AM
Accumulo3.local is what I currently have in the hosts file as the FQDN since it's not a networked machine. I have tried it with just Accumulo3 as well.
08-29-2013 03:53 PM
The cloudera-scm-agent can't determine what this machine specifically believes its fully-qualified domain name is. The hostname -f queries that, as does the python blurb. Let's resolve that:
Assuming this is a centos/rhel machine, what do you have set in /etc/sysconfig/network for the fully-qualified domain name? It'd also be useful to add in /etc/hosts with the format
IP FQDN shortname
Your case has only IP and FQDN. Just ensure the shortname comes last for any entries that have them.
If not centos/rhel, ensure that you set hostname accordingly for your OS, then run those three commands again. Once those all return what's expected (the latter will return both FQDN and IP) we should be in a great place for you to try starting the agent again.
09-03-2013 11:45 AM
Thank you for all of your help! I'm on Ubuntu, so no /etc/sysconfig/network file. However, I did add "IP FQDN shortname" into the hosts file which seemed to solve the hostname -f issue. I then ran the 3rd command successfully. Then when I went back to the GUI to setup the hosts, it said that the host already existed on the machine, so I just added the parcels to that machine. I am now to the point where everything appears to be installed and I'm configuring/fixing the health. Thank you so much for your help!