1. yes its working, else cloudera automated installation doesn't proceed to installation step as you have to enter the credentials. Any other way just to double check?
2. Ubuntu trusty/14.04
I kill/verify that no process is running on 9000 and 9001, firewalls are disabled. can telnet to clouder server 7182 from all hosts.
I could see that post installation failure when i netstat -tulnp|grep -w 9000, i can see that a python service is running(managed by cloudera) so that port must be accesible to cloudera. but nothing can be seen for port 9001. Fair to assume that 9001 is not being opened by cloudera?
The agent listens on port 9000
It will start up the supervisor on port 19001 and attempt to connect to it.
If nothing is listening on port 19001, then that indicates the supervisor was never started.
Check the agent log and see if there are any heartbeat errors.
The agent heartbeats to Cloudera Manager on port 7182.
If there are no heartbeat errors, that indicates heartbeats are likely not failing "normally"
I would do the following:
At the point where you get the error message in the wizard, just go back to the Cloudera Manager home page and then click on the Hosts tab to view all your hosts.
See if they have received heartbeats within the last 15 seconds.
If not, then restart the agent on the hosts that are not heartbeating (service cloudera-scm-server restart)
If CM does not receive a heartbeat within 10 seconds or so, do the following to generate a stack trace in the Cloudera Agent log:
# kill -SIGQUIT `cat /var/run/cloudera-scm-agent/cloudera-scm-agent.pid`
check out the threads and search for "heartbeat" string in the threads. If you find the heartbeat thread, post it here.
That should be a start.
The supervisor was started :
/var/log/cloudera-scm-agent# netstat -tulnp|grep -w 19001
tcp 0 0 127.0.0.1:19001 0.0.0.0:* LISTEN 18435/python
For the rest you have mentioned if there are no heartbeat errors which is not the case
as "Installation failed. Failed to receive heartbeat from agent."
I am also facing the similar issue , i have tried lot of possible ways that have mentioned in all post still the isse didn't got resolved.
Have you got any solution for this issue ?
I am afraid I cannot help you here...It was a lot of mess for me as well...But, in the interest of time, I moved to HDP and setup the cluster using Apache Ambari. It is completely open source and the issues are more clear in case they pop up unlike Cloudera Manager.
If you are having difficulty completing an installation or host addition we will need log information along with other system and environmental information to assist you. The information we request is critical to problem isolation and issue resolution even if you have already performed the steps or test we ask you to complete. Since we do not have hands or eyes on your system it is hard for us to arbitrarily eliminate possible root causes without this information. The Hadoop framework and by extension management software is complex as they both it interfaces with many different parts of typical environment.
Be sure to provide the following information:
(If you believe something may compromise your security be sure to obscure it or do not post it here.)
1.) Log data from /var/log/cloudera-scm-agent/cloudera-scm-agent.log on the affected host being added.
2.) Log data from /var/log/cloudera-scm-server/cloudera-scm-server.log on the system which host Cloudera Manager.
3.) The output of netstat -tunap | egrep '7180|7182|7183' as root on the Cloudera Manager host.
4.) The output of netstat -tunap |egrep '9000|19001' as root on the host being added.
5.) The output of the follow commands on both the CM host and the host being added.
paste <(hostname; hostname -i;echo ) <(python -c "import socket; print '\t' + socket.getfqdn(); print '\t\t\t' + socket.gethostbyname(socket.getfqdn())")
6.) The output of the following command on the host being added.
egrep 'server_host|server_port|listening_hostname' /etc/cloudera-scm-agent/config.ini