Created on 11-19-2013 07:17 AM - last edited on 10-31-2017 09:12 AM by cjervis
Hi,
I am trying to install development instance of Hadoop on Microsoft Azure VM (A single node cluster). I am running Ubuntu 12.04.3 LTS Linux.
Everything is going well until the very last step in the installation process where I get the following -
Installation failed. Failed to receive heartbeat from agent.
I looked at the logs and see the following errors -
>>[19/Nov/2013 15:00:55 +0000] 1922 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent/process
>>[19/Nov/2013 15:00:55 +0000] 1922 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent/supervisor
>>[19/Nov/2013 15:00:55 +0000] 1922 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent/supervisor/include
>>[19/Nov/2013 15:00:55 +0000] 1922 MainThread agent INFO Connecting to previous supervisor: agent-1304-1384872987.
>>[19/Nov/2013 15:00:55 +0000] 1922 MainThread _cplogging INFO [19/Nov/2013:15:00:55] ENGINE Bus STARTING
>>[19/Nov/2013 15:00:55 +0000] 1922 MainThread _cplogging INFO [19/Nov/2013:15:00:55] ENGINE Started monitor thread '_TimeoutMonitor'.
>>[19/Nov/2013 15:00:55 +0000] 1922 HTTPServer Thread-2 _cplogging ERROR [19/Nov/2013:15:00:55] ENGINE Error in HTTP server: shutting down
>>Traceback (most recent call last):
>> File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/process/servers.py", line 187, in _start_http_thread
>> self.httpserver.start()
>> File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1825, in start
>> raise socket.error(msg)
>>error: No socket could be created on ('NexusHadoopVM', 9000) -- [Errno 99] Cannot assign requested address
>>
>>[19/Nov/2013 15:00:55 +0000] 19
I checked if anything is already using 9000 and 9001 via
lsof -i :9000
lsof -i :9001
as well as netstat and both came with nothing. In the Azure VM manager I specified that both 9001 and 9002 are open (private and public), not sure what else needs to be configured.
I also using public IP address when adding a node to the cluster.
Please help!!!
Created 11-19-2013 07:30 AM
Hi there,
The "[Errno 99] Cannot assign requested address" points to hostname resolution rather than a port conflict. Please run this on the node:
$ python -c 'import socket; print socket.getfqdn(), socket.gethostbyname(socket.getfqdn())'
This should return the fully-qualified domain name as well as the IP address, confirming forward and reverse name resolution. Sanity-check this output against:
$ dig NexusHadoopVM
$ dig -x [IP returned in above dig command]
You may also wish to check your /etc/hosts file to make sure everything is OK there.
Regards,
--
Created 12-19-2014 12:16 PM
For various reasons I'm too embarrassed to talk about, we've run into this a few time with dev clusters in our private cloud and DNS getting mangled. We've found that if the python script that smark provided works, the DNS stuff is setup correctly and the agent will start.
Thanks smark for sharing.
python -c 'import socket; print socket.getfqdn(), socket.gethostbyname(socket.getfqdn())'
Created 03-06-2015 01:13 PM
How did you solve the DNS issue.
When I ran the Python command the FQDN is correct, but I get a 198 IP. I dont know where this is coming from.
I have a 3 node cluster and hosts file defined correctly on all 3 nodes. The Python returns incorrect IP on all 3 nodes.
python -c 'import socket; print socket.getfqdn(), socket.gethostbyname(socket.getfqdn())'
node2.hadoopdomain 198.105.254.228
while the node is is on 192.168.1.6 IP.
Any ideas?
Thank You,
Pranay Vyas
Created 08-11-2016 08:05 AM
Just saved me an hour of debugging. Thanks!
Created 03-06-2015 03:56 PM
Sounds like your nsswitch is wrong. It should be "files dns" not "dns files".
I would definately check it out and verify you don't have it setup wrong.
Created on 03-06-2015 06:24 PM - edited 03-06-2015 06:55 PM
Thanks for your response,
I changed the NSSWITCH host to file dns.
Now I am not able to ping any outside site.
Google.com returns not found. buti get response when I ping the IP.
any idea what's causing it?
Regards,
Pranay Vyas
Created 03-06-2015 07:20 PM
Okay, I was able to pass through the dns issue and changed the nsswitch on all nodes to files dns.
I uninstalled CLoudera manager and started all again.
It failed with the same error.
Installation failed. Failed to receive heartbeat from agent.
[Errno 99] Cannot assign requested address on ('base.hadoopdomain', 9000) -- [Errno 99] Cannot assign requested address
The Python socket command still give incorrect IP
[root@base ~]# python -c 'import socket; print socket.getfqdn(), socket.gethostbyname(socket.getfqdn())'
base.hadoopdomain 198.105.254.228
It gives same IP on all the nodes. THe FQDN comes correctly.
Regards,
Pranay Vyas
Created 03-06-2015 10:45 PM
Solved it.
The issue was with /etc/hosts file.
For some reason the Cloudera Manager was referring to localdomain which was not part of my /etc/host file
Had to add highlighted red texts on /etc/host file to resolve this error.
127.0.0.1 localhost.hadoopdomain localhost
::1 localhost.hadoopdomain localhost
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localdomain localhost
192.168.1.8 base.hadoopdomain.com base base.hadoopdomain
192.168.1.6 node1.hadoopdomain.com node1 node1.hadoopdomain
192.168.1.7 node2.hadoopdomain.com node2 node2.hadoopdomain
Regards,
Pranay Vyas
Created 03-09-2015 08:22 AM
Great job.
I try to keep the names as simple as possible so I can run thousands of scripts.
My hosts files is like:
127.0.0.1