Support Questions

Find answers, ask questions, and share your expertise

Cluster installation - The inspector failed to run on all hosts.

avatar
New Contributor

Hello,

 

I'm doing a basic cluster installation using Cloudera Standard 4 on CentOS 6.3 64 and it does not work...  Here is what I'm doing:

 

1. I run ./cloudera-manager-installer.bin and go to http://132.207.67.11:7180/ to continue the installation.

2. Then I follow the wizard and add the host 132.207.67.11 to the cluster installation.

3. The cluster installation on this host is successful but I find it strange that the IP is changed to 127.0.0.1.

4. But anyway, I continue and install the parcels.

5. And then at the hosts inspection I get this:

 

Cluster Installation

Inspect hosts for correctness   Run Again
Validations
  Inspector failed on the following hosts... 
  • homer.larim.polymtl.ca: IOException thrown while collecting data from host: Connection refused
Inspector ran on 0 hosts.
  The inspector failed to run on all hosts.
  0 hosts are running CDH3 and 1 hosts are running CDH4.
  All checked hosts are running the same version of components.
  All managed hosts have consistent versions of Java.
  All checked Cloudera Management Daemons versions are consistent with the server.
  All checked Cloudera Management Agents versions are consistent with the server.

 

And from the log file cloudera-scm-agent.out

:

 

[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent        INFO     SCM Agent Version: 4.7.3

[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent        INFO     Using directory: /var/run/cloudera-scm-agent

[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent        INFO     Using supervisor binary path: /usr/lib64/cmf/agent/src/cmf/../../build/env/bin/supervisord

[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent        WARNING  Agent is running on 127.0.0.1 (localhost). This is a misconfiguration for multi-machine clusters. Check your hostname settings.

[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent        INFO     Adding env vars that start with CMF_AGENT_

[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent        INFO     Logging to /var/log/cloudera-scm-agent/cloudera-scm-agent.log

 

Why is it using 127.0.0.1.  I'm always using FQDN name and the name resolution is made by DNS.  Just the be sure, here is the content of my hosts file:

 

127.0.0.1       localhost.localdomain   localhost

::1     homer.larim.polymtl.ca  homer   localhost6.localdomain6 localhost6

132.207.67.11   homer.larim.polymtl.ca  homer

 

Anyway, I'm a bit baffled by the problem since I'm doing a vanilla installation with all the default value.  Can anybody help me?

 

Thanks a lot...

 

1 ACCEPTED SOLUTION

avatar
New Contributor

Ok I found the solution.  I modified the hosts file for the following ( remove the FQDN for localhost 😞

 

127.0.0.1       localhost

::1             localhost6

132.207.67.11   homer.larim.polymtl.ca  homer

 

Edit: Removed an entry to avoid any confusion.

View solution in original post

12 REPLIES 12

avatar
New Contributor

Ok I found the solution.  I modified the hosts file for the following ( remove the FQDN for localhost 😞

 

127.0.0.1       localhost

::1             localhost6

132.207.67.11   homer.larim.polymtl.ca  homer

 

Edit: Removed an entry to avoid any confusion.

avatar
Guru

@foxz88 this is odd as the FQDN for the "localhost" loopback address should not affect anything with your actual host's FQDN or IP.  Can you take a look at what is returned from the "hostname" command?  I suspect your host is configured to call itself "localhost.localdomain" and that's why your services were trying to bind to that hostname and also why removing that from /etc/hosts freed you up.

 

The file /etc/sysconfig/network should contain a "HOSTNAME=" tag, which in your case should be listed as "homer.larim.polymtl.ca" in order for your services to work properly.  Note: it will require a reboot after you change that file in order for the new hostname to take effect.

avatar
New Contributor

Hi Clint,

 

[root@homer /]# hostname

homer.larim.polymtl.ca

 

[root@homer /]# cat /etc/sysconfig/network

HOSTNAME=homer.larim.polymtl.ca

NETWORKING=yes

 

That's why I was confused with the problem because everything was resolving properly.  Anyway, since I was using Centos, I installed Ubuntu in a virtual machine to see if I had the same problem.  The installation completed successfully so I compared the Ubuntu /etc/hosts file with the CentOS one and noticed the difference for the localhost entries.

 

I made the modification on all my Centos hosts and everything installed correctly...

avatar
Expert Contributor

Clint and Fox88:

 

I am so glad to see this thread. I had the same problem but it is still not resolved. mine is on AWS/EC2 ubuntu 12.04 and cloudera manager 5.02

 

it is no clear to me how ip laid out in hosts file.  I provide all my dns and ip info on my EC2 instances, tell me how my hosts file should look like.

public ip 54.186.89.67

public DNS ec2-54-186-89-67.us-west-2.compute.amazonaws.com

private ip 172.31.9.3

private DNS ip-172-31-9-3.us-west-2.compute.internal

 

I believe the pbulic IP should be the first column, and public DNS, then what?

 

the Cloudera doc has an example to descript /etc/hosts as following, but it never work for me.

127.0.0.1 localhost.localdomainlcoalhost

192.168.1.1   cluster-01.example.com   cluster-01

.....

 

I struggled with this hosts file, saw too many different ways to set this hosts file, however, no one wored for me yet:(

 

Please help. thank you so much.

 

Robin

  

avatar
Expert Contributor

Any one can help with this issues? I am stuck by this issues, no matter how modify this /etc/hosts file, then reboot my instances, I always get the same error.

I tried follwoing, it dont work.

 

127.0.0.1 localhost.localdomainlocalhost
52.88.118.48 ec2-52-88-118-48.us-west-2.compute.amazonaws.com ip-172-31-0-146.us-west-2.compute.internal
52.26.227.159 ec2-52-26-227-159.us-west-2.compute.amazonaws.com ip-172-31-0-147.us-west-2.compute.internal

 

or 

 

127.0.0.1 localhost.localdomainlocalhost
52.88.118.48 ec2-52-88-118-48.us-west-2.compute.amazonaws.com ip-172-31-0-146
52.26.227.159 ec2-52-26-227-159.us-west-2.compute.amazonaws.com ip-172-31-0-147

 

or 127.0.0.1 localhost.localdomainlocalhost
52.88.118.48 ec2-52-88-118-48.us-west-2.compute.amazonaws.com ec2-52-88-118-48 
52.26.227.159 ec2-52-88-118-48-159.us-west-2.compute.amazonaws.com ec2-52-88-118-48

 

none of them worked, what is wrong with these hosts file?? 

 

Can anyone help?

 

thank you so much.

 

Robin

avatar
Expert Contributor

HI Clint,

 

1. I am still installing Cloudera Manager 5 on EC2 ubuntu 12.04 LTS, but my /etc/sysconf/network file just dont have any tag like 'HOSTNAME='.

Do I have to modify this file to fix the 'Failed to receive heartbeat from agent' ?

 

2. I notice, there is no way to disable selinux=disabled as I could not find /etc/sysctl.conf file on my ubuntu. I wonder because I security group set 'anywhere' for all access. please correct me if I am wong.

 

3. there is no way to check on iptables start/stop/status on these EC2 ubuntu instances, is this normal to ubuntu?

 

The doc of troubleshooting dont have much info from cloudera, I am very dispointed as I could not find any answer for my questions so far.

 

Could not find a simple sample of /etc/hosts file either. please help.

 

thanks,

 

Robin

 

 

 

 

avatar
Master Guru

Can you clarify what problem you are seeing?  If CM cannot communcate with the agent, verify that you can use curl or similar tool to use the hostname for that host shown in Cloudera Manager to connect to port 9000 on the remote node.  Cloudera Manager must be able to do a heartbeat request to that agent's host on port 9000 in order for that host to be considered healthy.

 

If the agent cannot heartbeat to Cloudera Manager, then verify that the "server_host" setting in /etc/cloudera-scm-agent/config.ini shows Cloudera Manager's hostname and that you can connect from the agent's host to Cloudera Manager's host on port 7182 (this is the port that the Agent uses to send heartbeats to CM).

 

In general, the hosts file should appear with:

   IP   FQDN  hostname

 

If you make any changes in the agent configuration or the hosts file, use this to restart:

 

   service cloudera-scm-agent restart

 

If the above check out and the hostnames being reported by your agents are accessible to Cloudera Manager, then the hostnames should be less relevant at this stage (until you get to kerberos, ssl, etc.

 

If you are having a problem, please make sure to include exactly what you are seeing and a snippet from the log file if possible

 

CM: /var/log/cloudera-scm-server/cloudera-scm-server.log

Agent: /var/log/cloudera-scm-agent/cloudera-scm-agent.log

 

Two very common factors that can block communication between CM and Agent are firewall (iptables) and selinux.  As far as I know selinux is not installed by default on Ubuntu.

 

Ben

 

avatar
Expert Contributor

Thank you so much Ben for your help.

u

1. I reinstalled EC2 2 ubunt instances with m3.large type.

    .iptables disabled by sudo ufw disable

    . passwdless works in between instance

    . modified /etc/sysctl.conf to disable ipv6 as well as in /etc/hosts

    . download cloudera manager 5.2. my cluster is 2 nodes on EC2 unbuntu 12.04 LTS

    . my hosts file look like following:

    127.0.0.1 localhost
    52.88.149.30 ec2-52-88-149-30.us-west-2.compute.amazonaws.com 172.31.33.55 ip-172-31-33-55
    52.88.152.243 ec2-52-88-152-243.us-west-2.compute.amazonaws.com 172.31.33.56 ip-172-31-33-56

 

2. Install CM and never make it so far. alwasy failed to receive heartbeat from agent.

 

I have check the cloudera-scm-server and cloudera-scm-agent log file as you suggested. I copied and pasted the error part in these files as following

 

-----in the cloudera-scm-agent.log we got timeout
[01/Sep/2015 22:36:41 +0000] 2532 HTTPServer Thread-2 _cplogging INFO [01/Sep/2015:22:36:41] ENGINE Stopped thread '_TimeoutMonitor'.
[01/Sep/2015 22:36:41 +0000] 2532 HTTPServer Thread-2 _cplogging INFO [01/Sep/2015:22:36:41] ENGINE Bus STOPPED
[01/Sep/2015 22:36:41 +0000] 2532 HTTPServer Thread-2 _cplogging INFO [01/Sep/2015:22:36:41] ENGINE Bus EXITING
[01/Sep/2015 22:36:41 +0000] 2532 HTTPServer Thread-2 _cplogging INFO [01/Sep/2015:22:36:41] ENGINE Bus EXITED

 

------in the supervisord.log, the part repleat many times in log file
2015-09-01 19:43:29,698 CRIT Supervisor running as root (no user in config file)
2015-09-01 19:43:29,734 INFO RPC interface 'supervisor' initialized
2015-09-01 19:43:29,735 INFO RPC interface 'supervisor' initialized
2015-09-01 19:43:29,738 INFO daemonizing the supervisord process
2015-09-01 19:43:29,739 INFO supervisord started with pid 7173
2015-09-01 19:43:30,746 INFO spawned: 'cmflistener' with pid 7174
2015-09-01 19:43:31,881 INFO success: cmflistener entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2015-09-01 19:49:01,744 INFO exited: cmflistener (terminated by SIGTERM; not expected)
2015-09-01 19:49:01,744 WARN received SIGTERM indicating exit request


-------in the cmf-listener.log file
[01/Sep/2015 22:19:11 +0000] 942 MainThread supervisor_listener INFO Starting event listener as pid 942
[01/Sep/2015 22:19:12 +0000] 942 MainThread supervisor_listener INFO Cannot open agent FIFO (agent probably dead), dropping event

 

I know everyone is very busy these days, I apprecite your precise time. please take a look and give me some clue to finish this cloudera manager installation.

 

thank you very very much.

 

Robin

avatar
New Contributor

I have resolved my issue by switching off the iptables, 'firewalld' if its a centos 7.

 

PS: I know its way too late for any reply on this chain but i thougth it might help some one else cause i don't see any relevant solution on this forum neither on google.