Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Failed to receive heartbeat from agent. (Current Step)

avatar
New Contributor

Hi,

 

I am trying to install development instance of Hadoop on Microsoft Azure VM (A single node cluster).  I am running Ubuntu 12.04.3 LTS Linux.

 

Everything is going well until the very last step in the installation process where I get the following - 

 

Installation failed. Failed to receive heartbeat from agent.

  • Ensure that the host's hostname is configured properly.
  • Ensure that port 7182 is accessible on the Cloudera Manager server (check firewall rules).
  • Ensure that ports 9000 and 9001 are free on the host being added.
  • Check agent logs in /var/log/cloudera-scm-agent/ on the host being added (some of the logs can be found in the installation details).

I looked at the logs and see the following errors -

 

>>[19/Nov/2013 15:00:55 +0000] 1922 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent/process 
>>[19/Nov/2013 15:00:55 +0000] 1922 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent/supervisor 
>>[19/Nov/2013 15:00:55 +0000] 1922 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent/supervisor/include 
>>[19/Nov/2013 15:00:55 +0000] 1922 MainThread agent INFO Connecting to previous supervisor: agent-1304-1384872987. 
>>[19/Nov/2013 15:00:55 +0000] 1922 MainThread _cplogging INFO [19/Nov/2013:15:00:55] ENGINE Bus STARTING 
>>[19/Nov/2013 15:00:55 +0000] 1922 MainThread _cplogging INFO [19/Nov/2013:15:00:55] ENGINE Started monitor thread '_TimeoutMonitor'. 
>>[19/Nov/2013 15:00:55 +0000] 1922 HTTPServer Thread-2 _cplogging ERROR [19/Nov/2013:15:00:55] ENGINE Error in HTTP server: shutting down 
>>Traceback (most recent call last): 
>> File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/process/servers.py", line 187, in _start_http_thread 
>> self.httpserver.start() 
>> File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1825, in start 
>> raise socket.error(msg) 
>>error: No socket could be created on ('NexusHadoopVM', 9000) -- [Errno 99] Cannot assign requested address 
>> 
>>[19/Nov/2013 15:00:55 +0000] 19

 

I checked if anything is already using 9000 and 9001 via

lsof -i :9000 

lsof -i :9001

as well as netstat and both came with nothing.  In the Azure VM manager I specified that both 9001 and 9002 are open (private and public), not sure what else needs to be configured.

 

I also using public IP address when adding a node to the cluster.

 

Please help!!!

14 REPLIES 14

avatar
Expert Contributor

Great job.

 

I try to keep the names as simple as possible so I can run thousands of scripts.

 

My hosts files is like:

 

127.0.0.1     localhost

 

Looping interfaces from multiple machines. AWS or for example Linode looping you just would use the internal looping device. Fast and quickly managed.

 

#Cloudera Machines

 

192.168.2.1     n1

192.168.2.2     n2

192.168.2.3     n3

         "              n4

         "              n5

         "              n6

         "              n7

 

and so on to make it easier for changes across machines.

 

Such as:

 

for i in {1..300}; do ssh n$i date; done   <-- Checks dates on all machines to make sure each machine is sync'ed.

 

Makes life easier to make it simple.

avatar
Expert Contributor

do you mean you need to use base.hadoopdomain twice in the /etc/hosts name?

I have got the some errors and changed by your hosts file, and still got the same error.

 

mine is looked like following, please let me know if I did wrong here.

127.0.0.1 localdomain localhost
54.186.89.67 ec2-54-186-89-67.us-west-2.compute.amazonaws.com slave3 ec2-54-186-89-67.us-west-2.compute.amazonaws.com
54.186.87.178 ec2-54-186-87-178.us-west-2.compute.amazonaws.com slave2 ec2-54-186-87-178.us-west-2.compute.amazonaws.com

 

thanks,

Robin

avatar
Expert Contributor

Here's a step by step guide to troubleshoot this error:

 

http://www.yourtechchick.com/hadoop/failed-receive-heartbeat-agent-cloudera-hadoop/

avatar
Expert Contributor

Here's a step by step guide to troubleshoot this error:

 

http://www.yourtechchick.com/hadoop/failed-receive-heartbeat-agent-cloudera-hadoop/

avatar
New Contributor

in /etc/hosts for all nodes:

put ip_adress   FQDN   DN 

10.10.1.230 name.domain.com name

 

the FQDN must be before the name