11-06-2013 07:09 AM
I'm doing a basic cluster installation using Cloudera Standard 4 on CentOS 6.3 64 and it does not work... Here is what I'm doing:
1. I run ./cloudera-manager-installer.bin and go to http://220.127.116.11:7180/ to continue the installation.
2. Then I follow the wizard and add the host 18.104.22.168 to the cluster installation.
3. The cluster installation on this host is successful but I find it strange that the IP is changed to 127.0.0.1.
4. But anyway, I continue and install the parcels.
5. And then at the hosts inspection I get this:
|Inspector failed on the following hosts... |
|The inspector failed to run on all hosts.|
|0 hosts are running CDH3 and 1 hosts are running CDH4.|
|All checked hosts are running the same version of components.|
|All managed hosts have consistent versions of Java.|
|All checked Cloudera Management Daemons versions are consistent with the server.|
|All checked Cloudera Management Agents versions are consistent with the server.|
And from the log file cloudera-scm-agent.out
[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent INFO SCM Agent Version: 4.7.3
[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent INFO Using directory: /var/run/cloudera-scm-agent
[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent INFO Using supervisor binary path: /usr/lib64/cmf/agent/src/cmf/../../build/env/bin/supervisord
[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent WARNING Agent is running on 127.0.0.1 (localhost). This is a misconfiguration for multi-machine clusters. Check your hostname settings.
[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent INFO Adding env vars that start with CMF_AGENT_
[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent INFO Logging to /var/log/cloudera-scm-agent/cloudera-scm-agent.log
Why is it using 127.0.0.1. I'm always using FQDN name and the name resolution is made by DNS. Just the be sure, here is the content of my hosts file:
127.0.0.1 localhost.localdomain localhost
::1 homer.larim.polymtl.ca homer localhost6.localdomain6 localhost6
22.214.171.124 homer.larim.polymtl.ca homer
Anyway, I'm a bit baffled by the problem since I'm doing a vanilla installation with all the default value. Can anybody help me?
Thanks a lot...
11-06-2013 08:15 AM - edited 11-07-2013 06:10 AM
Ok I found the solution. I modified the hosts file for the following ( remove the FQDN for localhost ):
126.96.36.199 homer.larim.polymtl.ca homer
Edit: Removed an entry to avoid any confusion.
11-06-2013 01:38 PM
@foxz88 this is odd as the FQDN for the "localhost" loopback address should not affect anything with your actual host's FQDN or IP. Can you take a look at what is returned from the "hostname" command? I suspect your host is configured to call itself "localhost.localdomain" and that's why your services were trying to bind to that hostname and also why removing that from /etc/hosts freed you up.
The file /etc/sysconfig/network should contain a "HOSTNAME=" tag, which in your case should be listed as "homer.larim.polymtl.ca" in order for your services to work properly. Note: it will require a reboot after you change that file in order for the new hostname to take effect.
11-07-2013 06:08 AM
[root@homer /]# hostname
[root@homer /]# cat /etc/sysconfig/network
That's why I was confused with the problem because everything was resolving properly. Anyway, since I was using Centos, I installed Ubuntu in a virtual machine to see if I had the same problem. The installation completed successfully so I compared the Ubuntu /etc/hosts file with the CentOS one and noticed the difference for the localhost entries.
I made the modification on all my Centos hosts and everything installed correctly...
08-30-2015 10:53 PM
Clint and Fox88:
I am so glad to see this thread. I had the same problem but it is still not resolved. mine is on AWS/EC2 ubuntu 12.04 and cloudera manager 5.02
it is no clear to me how ip laid out in hosts file. I provide all my dns and ip info on my EC2 instances, tell me how my hosts file should look like.
public ip 188.8.131.52
public DNS ec2-54-186-89-67.us-west-2.compute.amazonaws.com
private ip 172.31.9.3
private DNS ip-172-31-9-3.us-west-2.compute.internal
I believe the pbulic IP should be the first column, and public DNS, then what?
the Cloudera doc has an example to descript /etc/hosts as following, but it never work for me.
192.168.1.1 cluster-01.example.com cluster-01
I struggled with this hosts file, saw too many different ways to set this hosts file, however, no one wored for me yet:(
Please help. thank you so much.
08-31-2015 05:00 PM
Any one can help with this issues? I am stuck by this issues, no matter how modify this /etc/hosts file, then reboot my instances, I always get the same error.
I tried follwoing, it dont work.
184.108.40.206 ec2-52-88-118-48.us-west-2.compute.amazonaws.com ip-172-31-0-146.us-west-2.compute.internal
220.127.116.11 ec2-52-26-227-159.us-west-2.compute.amazonaws.com ip-172-31-0-147.us-west-2.compute.internal
18.104.22.168 ec2-52-88-118-48.us-west-2.compute.amazonaws.com ip-172-31-0-146
22.214.171.124 ec2-52-26-227-159.us-west-2.compute.amazonaws.com ip-172-31-0-147
or 127.0.0.1 localhost.localdomainlocalhost
126.96.36.199 ec2-52-88-118-48.us-west-2.compute.amazonaws.com ec2-52-88-118-48
188.8.131.52 ec2-52-88-118-48-159.us-west-2.compute.amazonaws.com ec2-52-88-118-48
none of them worked, what is wrong with these hosts file??
Can anyone help?
thank you so much.
08-31-2015 11:35 PM
1. I am still installing Cloudera Manager 5 on EC2 ubuntu 12.04 LTS, but my /etc/sysconf/network file just dont have any tag like 'HOSTNAME='.
Do I have to modify this file to fix the 'Failed to receive heartbeat from agent' ?
2. I notice, there is no way to disable selinux=disabled as I could not find /etc/sysctl.conf file on my ubuntu. I wonder because I security group set 'anywhere' for all access. please correct me if I am wong.
3. there is no way to check on iptables start/stop/status on these EC2 ubuntu instances, is this normal to ubuntu?
The doc of troubleshooting dont have much info from cloudera, I am very dispointed as I could not find any answer for my questions so far.
Could not find a simple sample of /etc/hosts file either. please help.
09-01-2015 09:22 AM
Can you clarify what problem you are seeing? If CM cannot communcate with the agent, verify that you can use curl or similar tool to use the hostname for that host shown in Cloudera Manager to connect to port 9000 on the remote node. Cloudera Manager must be able to do a heartbeat request to that agent's host on port 9000 in order for that host to be considered healthy.
If the agent cannot heartbeat to Cloudera Manager, then verify that the "server_host" setting in /etc/cloudera-scm-agent/config.ini shows Cloudera Manager's hostname and that you can connect from the agent's host to Cloudera Manager's host on port 7182 (this is the port that the Agent uses to send heartbeats to CM).
In general, the hosts file should appear with:
IP FQDN hostname
If you make any changes in the agent configuration or the hosts file, use this to restart:
service cloudera-scm-agent restart
If the above check out and the hostnames being reported by your agents are accessible to Cloudera Manager, then the hostnames should be less relevant at this stage (until you get to kerberos, ssl, etc.
If you are having a problem, please make sure to include exactly what you are seeing and a snippet from the log file if possible
Two very common factors that can block communication between CM and Agent are firewall (iptables) and selinux. As far as I know selinux is not installed by default on Ubuntu.
09-01-2015 03:57 PM
Thank you so much Ben for your help.
1. I reinstalled EC2 2 ubunt instances with m3.large type.
.iptables disabled by sudo ufw disable
. passwdless works in between instance
. modified /etc/sysctl.conf to disable ipv6 as well as in /etc/hosts
. download cloudera manager 5.2. my cluster is 2 nodes on EC2 unbuntu 12.04 LTS
. my hosts file look like following:
184.108.40.206 ec2-52-88-149-30.us-west-2.compute.amazonaws.com 172.31.33.55 ip-172-31-33-55
220.127.116.11 ec2-52-88-152-243.us-west-2.compute.amazonaws.com 172.31.33.56 ip-172-31-33-56
2. Install CM and never make it so far. alwasy failed to receive heartbeat from agent.
I have check the cloudera-scm-server and cloudera-scm-agent log file as you suggested. I copied and pasted the error part in these files as following
-----in the cloudera-scm-agent.log we got timeout
[01/Sep/2015 22:36:41 +0000] 2532 HTTPServer Thread-2 _cplogging INFO [01/Sep/2015:22:36:41] ENGINE Stopped thread '_TimeoutMonitor'.
[01/Sep/2015 22:36:41 +0000] 2532 HTTPServer Thread-2 _cplogging INFO [01/Sep/2015:22:36:41] ENGINE Bus STOPPED
[01/Sep/2015 22:36:41 +0000] 2532 HTTPServer Thread-2 _cplogging INFO [01/Sep/2015:22:36:41] ENGINE Bus EXITING
[01/Sep/2015 22:36:41 +0000] 2532 HTTPServer Thread-2 _cplogging INFO [01/Sep/2015:22:36:41] ENGINE Bus EXITED
------in the supervisord.log, the part repleat many times in log file
2015-09-01 19:43:29,698 CRIT Supervisor running as root (no user in config file)
2015-09-01 19:43:29,734 INFO RPC interface 'supervisor' initialized
2015-09-01 19:43:29,735 INFO RPC interface 'supervisor' initialized
2015-09-01 19:43:29,738 INFO daemonizing the supervisord process
2015-09-01 19:43:29,739 INFO supervisord started with pid 7173
2015-09-01 19:43:30,746 INFO spawned: 'cmflistener' with pid 7174
2015-09-01 19:43:31,881 INFO success: cmflistener entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2015-09-01 19:49:01,744 INFO exited: cmflistener (terminated by SIGTERM; not expected)
2015-09-01 19:49:01,744 WARN received SIGTERM indicating exit request
-------in the cmf-listener.log file
[01/Sep/2015 22:19:11 +0000] 942 MainThread supervisor_listener INFO Starting event listener as pid 942
[01/Sep/2015 22:19:12 +0000] 942 MainThread supervisor_listener INFO Cannot open agent FIFO (agent probably dead), dropping event
I know everyone is very busy these days, I apprecite your precise time. please take a look and give me some clue to finish this cloudera manager installation.
thank you very very much.
12-13-2015 06:26 AM
I have resolved my issue by switching off the iptables, 'firewalld' if its a centos 7.
PS: I know its way too late for any reply on this chain but i thougth it might help some one else cause i don't see any relevant solution on this forum neither on google.