Hello,
I'm doing a basic cluster installation using Cloudera Standard 4 on CentOS 6.3 64 and it does not work... Here is what I'm doing:
1. I run ./cloudera-manager-installer.bin and go to http://132.207.67.11:7180/ to continue the installation.
2. Then I follow the wizard and add the host 132.207.67.11 to the cluster installation.
3. The cluster installation on this host is successful but I find it strange that the IP is changed to 127.0.0.1.
4. But anyway, I continue and install the parcels.
5. And then at the hosts inspection I get this:
Cluster Installation
Inspector failed on the following hosts...
|
|
The inspector failed to run on all hosts. | |
0 hosts are running CDH3 and 1 hosts are running CDH4. | |
All checked hosts are running the same version of components. | |
All managed hosts have consistent versions of Java. | |
All checked Cloudera Management Daemons versions are consistent with the server. | |
All checked Cloudera Management Agents versions are consistent with the server. |
And from the log file cloudera-scm-agent.out
:
[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent INFO SCM Agent Version: 4.7.3
[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent INFO Using directory: /var/run/cloudera-scm-agent
[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent INFO Using supervisor binary path: /usr/lib64/cmf/agent/src/cmf/../../build/env/bin/supervisord
[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent WARNING Agent is running on 127.0.0.1 (localhost). This is a misconfiguration for multi-machine clusters. Check your hostname settings.
[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent INFO Adding env vars that start with CMF_AGENT_
[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent INFO Logging to /var/log/cloudera-scm-agent/cloudera-scm-agent.log
Why is it using 127.0.0.1. I'm always using FQDN name and the name resolution is made by DNS. Just the be sure, here is the content of my hosts file:
127.0.0.1 localhost.localdomain localhost
::1 homer.larim.polymtl.ca homer localhost6.localdomain6 localhost6
132.207.67.11 homer.larim.polymtl.ca homer
Anyway, I'm a bit baffled by the problem since I'm doing a vanilla installation with all the default value. Can anybody help me?
Thanks a lot...
Created on 11-06-2013 08:15 AM - edited 11-07-2013 06:10 AM
Ok I found the solution. I modified the hosts file for the following ( remove the FQDN for localhost 😞
127.0.0.1 localhost
::1 localhost6
132.207.67.11 homer.larim.polymtl.ca homer
Edit: Removed an entry to avoid any confusion.
Created on 11-06-2013 08:15 AM - edited 11-07-2013 06:10 AM
Ok I found the solution. I modified the hosts file for the following ( remove the FQDN for localhost 😞
127.0.0.1 localhost
::1 localhost6
132.207.67.11 homer.larim.polymtl.ca homer
Edit: Removed an entry to avoid any confusion.
Created 11-06-2013 01:38 PM
@foxz88 this is odd as the FQDN for the "localhost" loopback address should not affect anything with your actual host's FQDN or IP. Can you take a look at what is returned from the "hostname" command? I suspect your host is configured to call itself "localhost.localdomain" and that's why your services were trying to bind to that hostname and also why removing that from /etc/hosts freed you up.
The file /etc/sysconfig/network should contain a "HOSTNAME=" tag, which in your case should be listed as "homer.larim.polymtl.ca" in order for your services to work properly. Note: it will require a reboot after you change that file in order for the new hostname to take effect.
Created 11-07-2013 06:08 AM
Hi Clint,
[root@homer /]# hostname
homer.larim.polymtl.ca
[root@homer /]# cat /etc/sysconfig/network
HOSTNAME=homer.larim.polymtl.ca
NETWORKING=yes
That's why I was confused with the problem because everything was resolving properly. Anyway, since I was using Centos, I installed Ubuntu in a virtual machine to see if I had the same problem. The installation completed successfully so I compared the Ubuntu /etc/hosts file with the CentOS one and noticed the difference for the localhost entries.
I made the modification on all my Centos hosts and everything installed correctly...
Created 08-30-2015 10:53 PM
Clint and Fox88:
I am so glad to see this thread. I had the same problem but it is still not resolved. mine is on AWS/EC2 ubuntu 12.04 and cloudera manager 5.02
it is no clear to me how ip laid out in hosts file. I provide all my dns and ip info on my EC2 instances, tell me how my hosts file should look like.
public ip 54.186.89.67
public DNS ec2-54-186-89-67.us-west-2.compute.amazonaws.com
private ip 172.31.9.3
private DNS ip-172-31-9-3.us-west-2.compute.internal
I believe the pbulic IP should be the first column, and public DNS, then what?
the Cloudera doc has an example to descript /etc/hosts as following, but it never work for me.
127.0.0.1 localhost.localdomainlcoalhost
192.168.1.1 cluster-01.example.com cluster-01
.....
I struggled with this hosts file, saw too many different ways to set this hosts file, however, no one wored for me yet:(
Please help. thank you so much.
Robin
Created 08-31-2015 05:00 PM
Any one can help with this issues? I am stuck by this issues, no matter how modify this /etc/hosts file, then reboot my instances, I always get the same error.
I tried follwoing, it dont work.
127.0.0.1 localhost.localdomainlocalhost
52.88.118.48 ec2-52-88-118-48.us-west-2.compute.amazonaws.com ip-172-31-0-146.us-west-2.compute.internal
52.26.227.159 ec2-52-26-227-159.us-west-2.compute.amazonaws.com ip-172-31-0-147.us-west-2.compute.internal
or
127.0.0.1 localhost.localdomainlocalhost
52.88.118.48 ec2-52-88-118-48.us-west-2.compute.amazonaws.com ip-172-31-0-146
52.26.227.159 ec2-52-26-227-159.us-west-2.compute.amazonaws.com ip-172-31-0-147
or 127.0.0.1 localhost.localdomainlocalhost
52.88.118.48 ec2-52-88-118-48.us-west-2.compute.amazonaws.com ec2-52-88-118-48
52.26.227.159 ec2-52-88-118-48-159.us-west-2.compute.amazonaws.com ec2-52-88-118-48
none of them worked, what is wrong with these hosts file??
Can anyone help?
thank you so much.
Robin
Created 08-31-2015 11:35 PM
HI Clint,
1. I am still installing Cloudera Manager 5 on EC2 ubuntu 12.04 LTS, but my /etc/sysconf/network file just dont have any tag like 'HOSTNAME='.
Do I have to modify this file to fix the 'Failed to receive heartbeat from agent' ?
2. I notice, there is no way to disable selinux=disabled as I could not find /etc/sysctl.conf file on my ubuntu. I wonder because I security group set 'anywhere' for all access. please correct me if I am wong.
3. there is no way to check on iptables start/stop/status on these EC2 ubuntu instances, is this normal to ubuntu?
The doc of troubleshooting dont have much info from cloudera, I am very dispointed as I could not find any answer for my questions so far.
Could not find a simple sample of /etc/hosts file either. please help.
thanks,
Robin
Created 09-01-2015 09:22 AM
Can you clarify what problem you are seeing? If CM cannot communcate with the agent, verify that you can use curl or similar tool to use the hostname for that host shown in Cloudera Manager to connect to port 9000 on the remote node. Cloudera Manager must be able to do a heartbeat request to that agent's host on port 9000 in order for that host to be considered healthy.
If the agent cannot heartbeat to Cloudera Manager, then verify that the "server_host" setting in /etc/cloudera-scm-agent/config.ini shows Cloudera Manager's hostname and that you can connect from the agent's host to Cloudera Manager's host on port 7182 (this is the port that the Agent uses to send heartbeats to CM).
In general, the hosts file should appear with:
IP FQDN hostname
If you make any changes in the agent configuration or the hosts file, use this to restart:
service cloudera-scm-agent restart
If the above check out and the hostnames being reported by your agents are accessible to Cloudera Manager, then the hostnames should be less relevant at this stage (until you get to kerberos, ssl, etc.
If you are having a problem, please make sure to include exactly what you are seeing and a snippet from the log file if possible
CM: /var/log/cloudera-scm-server/cloudera-scm-server.log
Agent: /var/log/cloudera-scm-agent/cloudera-scm-agent.log
Two very common factors that can block communication between CM and Agent are firewall (iptables) and selinux. As far as I know selinux is not installed by default on Ubuntu.
Ben
Created 09-01-2015 03:57 PM
Thank you so much Ben for your help.
u
1. I reinstalled EC2 2 ubunt instances with m3.large type.
.iptables disabled by sudo ufw disable
. passwdless works in between instance
. modified /etc/sysctl.conf to disable ipv6 as well as in /etc/hosts
. download cloudera manager 5.2. my cluster is 2 nodes on EC2 unbuntu 12.04 LTS
. my hosts file look like following:
127.0.0.1 localhost
52.88.149.30 ec2-52-88-149-30.us-west-2.compute.amazonaws.com 172.31.33.55 ip-172-31-33-55
52.88.152.243 ec2-52-88-152-243.us-west-2.compute.amazonaws.com 172.31.33.56 ip-172-31-33-56
2. Install CM and never make it so far. alwasy failed to receive heartbeat from agent.
I have check the cloudera-scm-server and cloudera-scm-agent log file as you suggested. I copied and pasted the error part in these files as following
-----in the cloudera-scm-agent.log we got timeout
[01/Sep/2015 22:36:41 +0000] 2532 HTTPServer Thread-2 _cplogging INFO [01/Sep/2015:22:36:41] ENGINE Stopped thread '_TimeoutMonitor'.
[01/Sep/2015 22:36:41 +0000] 2532 HTTPServer Thread-2 _cplogging INFO [01/Sep/2015:22:36:41] ENGINE Bus STOPPED
[01/Sep/2015 22:36:41 +0000] 2532 HTTPServer Thread-2 _cplogging INFO [01/Sep/2015:22:36:41] ENGINE Bus EXITING
[01/Sep/2015 22:36:41 +0000] 2532 HTTPServer Thread-2 _cplogging INFO [01/Sep/2015:22:36:41] ENGINE Bus EXITED
------in the supervisord.log, the part repleat many times in log file
2015-09-01 19:43:29,698 CRIT Supervisor running as root (no user in config file)
2015-09-01 19:43:29,734 INFO RPC interface 'supervisor' initialized
2015-09-01 19:43:29,735 INFO RPC interface 'supervisor' initialized
2015-09-01 19:43:29,738 INFO daemonizing the supervisord process
2015-09-01 19:43:29,739 INFO supervisord started with pid 7173
2015-09-01 19:43:30,746 INFO spawned: 'cmflistener' with pid 7174
2015-09-01 19:43:31,881 INFO success: cmflistener entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2015-09-01 19:49:01,744 INFO exited: cmflistener (terminated by SIGTERM; not expected)
2015-09-01 19:49:01,744 WARN received SIGTERM indicating exit request
-------in the cmf-listener.log file
[01/Sep/2015 22:19:11 +0000] 942 MainThread supervisor_listener INFO Starting event listener as pid 942
[01/Sep/2015 22:19:12 +0000] 942 MainThread supervisor_listener INFO Cannot open agent FIFO (agent probably dead), dropping event
I know everyone is very busy these days, I apprecite your precise time. please take a look and give me some clue to finish this cloudera manager installation.
thank you very very much.
Robin
Created 12-13-2015 06:26 AM
I have resolved my issue by switching off the iptables, 'firewalld' if its a centos 7.
PS: I know its way too late for any reply on this chain but i thougth it might help some one else cause i don't see any relevant solution on this forum neither on google.