Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

CDH installation failed on all machines except cloudera manager host

CDH installation failed on all machines except cloudera manager host

New Contributor

On all machines except the CM host the SCM agent is trying to heartbeat to the wrong IP address.

 

The logs show:

ERROR    Heartbeating to 172.21.8.84:7182 failed

 

The agent config.ini shows the correct IP address to the SCM server:

# Hostname of Cloudera SCM Server

server_host=192.168.56.103

 

# Port that server is listening on

server_port=7182

 

The network is set up correctly.  I've run hostname -f on all the hosts and the /etc/hosts file is configured properly on all hosts.

 

I can't figure out why it's trying to heartbeat to the wrong IP address.  Any ideas? 

6 REPLIES 6

Re: CDH installation failed on all machines except cloudera manager host

New Contributor
I noticed that in the log file I get this message. j
>>[28/Mar/2014 05:10:54 +0000] 1870 Monitor-HostMonitor throttling_logger WARNING (359 skipped) hostname elephant.dynsight.local differs from the canonical name elephant

Not sure if that provides a clue.

Re: CDH installation failed on all machines except cloudera manager host

Master Collaborator

Can you do an nslookup (or a ping) on both hostnames and see what IP is used?  (eg. elephant.dynsight.local as well as elephant).  Also, what do you have in the HOSTNAME property in /etc/sysconfig/network?

 

Things get a bit tricky in multi-homed environments and it seems that the installer somehow picked up the wrong interface.  Also, you might want to see if the Host Inspector will give you a report.  In CM, on the "Hosts" page, you can run the Host Inspector.  Not sure if it will work, though, if your agents are not connecting.

 

Finally, I would do a reverse lookup on the IP 192.168.56.103 to see if it resolves back to your expected hostname:

 

dig -x 192.168.56.103

 

 

Re: CDH installation failed on all machines except cloudera manager host

New Contributor

/etc/sysconfig/network looks like this on each host:

NETWORKING=yes

HOSTNAME=tiger.dynsight.local

 

When I run dig from tiger I get this response.  It looks like it's querying 192.168.1.1 instead of 192.168.56.1.  Any thoughts on that?

 

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.6 <<>> -x 192.168.56.103

;; global options: +cmd

;; Got answer:

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20919

;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

 

;; QUESTION SECTION:

;103.56.168.192.in-addr.arpa.INPTR

 

;; AUTHORITY SECTION:

103.56.168.192.in-addr.arpa. 10800 INSOAlocalhost. nobody.invalid. 1 600 1200 604800 10800

 

;; Query time: 19 msec

;; SERVER: 192.168.1.1#53(192.168.1.1)

;; WHEN: Fri Mar 28 16:19:08 2014

;; MSG SIZE  rcvd: 104

Re: CDH installation failed on all machines except cloudera manager host

Master Collaborator

That's probably because you have 192.168.1.1 defined as your "nameserver" in /etc/resolv.conf.  Or it's your default gateway and you have no DNS servers defined, so dig is trying to ask your router.  There is no ANSWER section of the dig output so that means dig was not able to resolve that IP address.  What do you have in /etc/resolv.conf and /etc/nsswitch.conf?

Re: CDH installation failed on all machines except cloudera manager host

Super Collaborator

The following is a one liner that will check forward and reverse lookup on a node

 

python -c "import socket; print socket.getfqdn(); print socket.gethostbyname(socket.getfqdn())"

 

Hosts file layout is important as well (regardless of DNS configuration).  Review the discussion here:

 

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-Manager-Instal...

Re: CDH installation failed on all machines except cloudera manager host

New Contributor
I figured out the problem. My hosts file had two entries for the same ip address - one for fqdn and one for just the canonical name. It's working fine now that I've fixed that.

Thanks for all your help.