Support Questions
Find answers, ask questions, and share your expertise

HDP 2.6.2 - ambari infra

Explorer

Yesterday I installed HDP 2.6.2 using the Ambari method, on a 3 node Ubuntu 16.10 x86_64 environment - or rather tried to. I'm using JDK 8. I installed all components, pretty much with defaults.

All the required ports should be open between nodes (I'm using cloudfoundry)
The install failed towards the latter part when starting some services. Checking ambari things seem deployed, though I see that
* zookeeper is starting ok
* Ambari infra is failing to start,

java.util.concurrent.TimeoutException: Could not connect to ZooKeeper hdp262-1.novalocal:2181,hdp262-2.novalocal:2181,hdp262-3.novalocal:2181 within 15000 ms
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper hdp262-1.novalocal:2181,hdp262-2.novalocal:2181,hdp262-3.novalocal:2181 within 15000 ms
Return code: 1. Sleeping for 5 sec(s)
2017-10-05 07:27:28,881 - Execute['ambari-sudo.sh JAVA_HOME=/usr/jdk64/jdk1.8.0_112 /usr/lib/ambari-infra-solr-client/solrCloudCli.sh --zookeeper-connect-string hdp262-1.novalocal:2181,hdp262-2.novalocal:2181,hdp262-3.novalocal:2181 --znode /infra-solr --create-znode --retry 30 --interval 5'] {}

I AM able to connect via TCP between those ports on the nodes - for example

cloudusr@hdp262-1:/var/log/ambari-agent$ telnet hdp262-1.novalocal 2181

Trying 127.0.1.1...

Connected to hdp262-1.novalocal.

Escape character is '^]'.

^]quit

telnet> quit

Connection closed.

Meanwhile in the full ambari log I see:

Waiting for client to connect to ZooKeeper

Opening socket connection to server hdp262-1.novalocal/9.20.65.115:2181. Will not attempt to authenticate using SASL (unknown error)

Socket connection established to hdp262-1.novalocal/9.20.65.115:2181, initiating session

Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect

I'm not familar enough with the stack/hdp/ambari/zookeeper... any tips?

10 REPLIES 10

Super Mentor

@Nigel Jones

I suspect that the problem is here:

cloudusr@hdp262-1:/var/log/ambari-agent$ telnet hdp262-1.novalocal 2181
     Trying 127.0.1.1...
     Connected to hdp262-1.novalocal.
     Escape character is '^]'.
     ^]quit

.

When you are doing telnet then the address "hdp262-1.novalocal" is being translated to "127.0.0.1" So are you sure that the host and IP Address mapping is fine at your end?

.

Please check the following outputs:

# cat /etc/hosts
# hostname -f 
# python -c 'import socket;print socket.getfqdn()'

.

To verify if the zookeeper is listening to 127.0.0.1 address or bound to all addresses? Please run the following command from the host where the zookeeper is running.

# netstat -tnlpa | grep 2181

If you are getting the correct IP address mapping for the Hostname "hdp262-1.novalocal"

Explorer

🙂 In this particular case, hdp262-1.novalocal is indeed the system where the script is running, so it happens to be localhost too. I see the same errors when it tries to get to the other systems

My /etc/hosts contains:

127.0.1.1 hdp262-1.novalocal hdp262-1

127.0.0.1 localhost

9.20.65.115 hdp262-1.novalocal hdp262-1

9.20.65.135 hdp262-2.novalocal hdp262-2

9.20.65.175 hdp262-3.novalocal hdp262-3

and this is common through each machine (well the last 3 lines). This is a developer cf environment and those hostnames don't resolve over DNS - not ideal, but I figured this was a quick workaround (I'd need to work a little more to set something up for a local domain)

cloudusr@hdp262-1:~$ hostname -f

hdp262-1.novalocal

cloudusr@hdp262-1:~$ python -c 'import socket;print socket.getfqdn()'

hdp262-1.novalocal

Mentor

@Nigel Jones

Can you remove or comment the first entry of your /etc/hosts to look like below

# 127.0.1.1 hdp262-1.novalocal hdp262-1
127.0.0.1 localhost
9.20.65.115 hdp262-1.novalocal hdp262-1
9.20.65.135 hdp262-2.novalocal hdp262-2
9.20.65.175 hdp262-3.novalocal hdp262-3

Then retry it should be okay

Explorer

cloudusr@hdp262-1:~$ netstat -tnlpa | grep 2181

(Not all processes could be identified, non-owned process info

will not be shown, you would have to be root to see it all.)

tcp6 0 0 :::2181 :::* LISTEN -

tcp6 0 0 127.0.1.1:2181 127.0.0.1:53344 TIME_WAIT

Super Mentor

@Nigel Jones

We should not edit the first two lines of the "/etc/hosts" file. So it should ideall look like following:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
9.20.65.115 hdp262-1.novalocal hdp262-1
9.20.65.135 hdp262-2.novalocal hdp262-2
9.20.65.175 hdp262-3.novalocal hdp262-3

.

Please see the Note in the following Doc: https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.2.0/bk_ambari-installation/content/edit_the_hos...

Which says:

Do not remove the following two lines from your hosts file. Removing or editing the following lines may cause various programs that require network functionality to fail.

127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6

.

Super Mentor

@Nigel Jones

Did it work?

Explorer

Trying the /etc/hosts change...

Explorer

Sadly not, so I need to dig deeper, but ran out of time before having to prioritize something else for a few days. I'll return to the cluster later this week. In the interim I'm trying the 2.6.1 docker image. thanks for the tips. I'll update when I can clarify more (or retry the install from a clean base, with the appropriate hosts entries in place/replaced by DNS)

Explorer

Can you telnet to the other zookeeper nodes on 2181. Looks like you tested the telnet on the localhost node, but you need to check it for other hosts. I had faced exactly same error message and i noticed that i was missing an entry in the hosts file to one of the zookeeper nodes.