Support Questions

Find answers, ask questions, and share your expertise

Cluster installation - The inspector failed to run on all hosts.

avatar
New Contributor

Hello,

 

I'm doing a basic cluster installation using Cloudera Standard 4 on CentOS 6.3 64 and it does not work...  Here is what I'm doing:

 

1. I run ./cloudera-manager-installer.bin and go to http://132.207.67.11:7180/ to continue the installation.

2. Then I follow the wizard and add the host 132.207.67.11 to the cluster installation.

3. The cluster installation on this host is successful but I find it strange that the IP is changed to 127.0.0.1.

4. But anyway, I continue and install the parcels.

5. And then at the hosts inspection I get this:

 

Cluster Installation

Inspect hosts for correctness   Run Again
Validations
  Inspector failed on the following hosts... 
  • homer.larim.polymtl.ca: IOException thrown while collecting data from host: Connection refused
Inspector ran on 0 hosts.
  The inspector failed to run on all hosts.
  0 hosts are running CDH3 and 1 hosts are running CDH4.
  All checked hosts are running the same version of components.
  All managed hosts have consistent versions of Java.
  All checked Cloudera Management Daemons versions are consistent with the server.
  All checked Cloudera Management Agents versions are consistent with the server.

 

And from the log file cloudera-scm-agent.out

:

 

[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent        INFO     SCM Agent Version: 4.7.3

[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent        INFO     Using directory: /var/run/cloudera-scm-agent

[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent        INFO     Using supervisor binary path: /usr/lib64/cmf/agent/src/cmf/../../build/env/bin/supervisord

[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent        WARNING  Agent is running on 127.0.0.1 (localhost). This is a misconfiguration for multi-machine clusters. Check your hostname settings.

[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent        INFO     Adding env vars that start with CMF_AGENT_

[06/Nov/2013 09:45:45 +0000] 21416 MainThread agent        INFO     Logging to /var/log/cloudera-scm-agent/cloudera-scm-agent.log

 

Why is it using 127.0.0.1.  I'm always using FQDN name and the name resolution is made by DNS.  Just the be sure, here is the content of my hosts file:

 

127.0.0.1       localhost.localdomain   localhost

::1     homer.larim.polymtl.ca  homer   localhost6.localdomain6 localhost6

132.207.67.11   homer.larim.polymtl.ca  homer

 

Anyway, I'm a bit baffled by the problem since I'm doing a vanilla installation with all the default value.  Can anybody help me?

 

Thanks a lot...

 

1 ACCEPTED SOLUTION

avatar
New Contributor

Ok I found the solution.  I modified the hosts file for the following ( remove the FQDN for localhost 😞

 

127.0.0.1       localhost

::1             localhost6

132.207.67.11   homer.larim.polymtl.ca  homer

 

Edit: Removed an entry to avoid any confusion.

View solution in original post

12 REPLIES 12

avatar
Explorer

Hi, 

 

I was interested in this threat since I encountered similar issues in a simple cluster config.

I'm using CM5 on CentOS 7.2

The host inspector gave :

  • master1.domain; worker[1-3].domain: IOException thrown while collecting data from host: Connection refused

The point is that each hosts is using a public IP address on eth0 and a private IP address on eth1.

As you can guess, I want my cluster to use the internal IP only.

I tried several things (between each stage, I restarted cloudera-scm-agent to make sure the modification is taken into account)

 

1- I tried to make some modifications to my /etc/hosts to precise public FQDN for public IP ==> FAIL

2 - I tried to use /etc/cloudera-scm/agent/config.ini , listening_ip to listen ONLY on the private IP ==> FAIL

3 - I tried to use /etc/cloudera-scm/agent/config.ini , listening_hostname to listen ONLY on the hostname associated with the private interface ==> FAIL. 

At this stage, I can say, Cloudera agent is listening only on private interface (lsof confirmed) bue the inspector does not seem to focus on this

4 - I shut down eth0 (public interface) to disable multiple hostnames  ==> SUCCESS

 

At this stage, I wondered why 3 fails and 4 succeeded. I think this is due to the python script below used to detect the hostname instead of using the cloudera config file :

 

python -c 'import socket; \

              print socket.getfqdn(), \

                    socket.gethostbyname(socket.getfqdn())'

 

This script seem to give the fqdn for eth0 first so no luck for me.

Not sure this is the solution but the trick worked for me. It could make sense if Cloudera staff review the inspector code and make sure python code know how to use the config file.

 

 

 

 

avatar
Contributor

this fixed my problem with /etc/hosts

 

#internal.ip local.hostname
192.168.1.2 testserver
192.168.1.3 testnode

 

#public.ip public.hostname

#https://www.whatismyip.com/reverse-dns-lookup/
x.x.x.x testserver.wherever.com

 

hope this helps someone else.

avatar
Explorer

try add allow port on firewalld on host, this solved my prob

$ firewall-cmd --zone=public --add-port=9000/tcp

(Centos7)

or $ service firewalld disable

Hope this may help