Support Questions

Find answers, ask questions, and share your expertise

Ambari automatic registration failed (Step 3: Confirm Hosts)

avatar
Explorer

Log attached : ambari-server.txt

I am trying to install a 3 node cluster using ambari

Machine 1 : Master Node + Local repository + Ambari Server

Machine 2 and 3 : Data Node

Local repository is on the same system as master node and ambari-server.

I am using ambari server 2.4.2 to install hdp 2.5

repolist command lists :

1) HDP-2.5

2) HDP-UTILS-1.1.0.21

3) Updates-ambari-2.4.2.0

I am using passwordless SSH for user hdpuser. This is the logged in user in the system. It is also sudo user. I have confirmed that password SSH works for this user.

After lot of install failures, I am trying to install ONLY master node to understand problem.

I am not get past auto-registration step itself.

I am getting lot of messages like following. Not sure if they are "normal" or cause of problem :

24 May 2017 15:43:14,842 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert ambari_agent_disk_usage for an invalid cluster named HDP

24 May 2017 15:17:17,424 INFO [ambari-client-thread-25] RepoUtil:156 - Found 0 service repos: []

Registration failure logs :

24 May 2017 15:22:33,895 INFO [ambari-client-thread-25] BootStrapImpl:108 - BootStrapping hosts machine1.domain.net: 24 May 2017 15:22:33,901 INFO [Thread-44] BSRunner:190 - Kicking off the scheduler for polling on logs in /var/run/ambari-server/bootstrap/1

24 May 2017 15:22:33,901 INFO [Thread-44] BSRunner:257 - Host= machine1.domain.net bs=/usr/lib/python2.6/site-packages/ambari_server/bootstrap.py requestDir=/var/run/ambari-server/bootstrap/1 user=hdpuser sshPort=22 keyfile=/var/run/ambari-server/bootstrap/1/sshKey passwordFile null server=machine1.domain.net version=2.4.2.0 serverPort=8080 userRunAs=root timeout=300

24 May 2017 15:22:33,903 INFO [pool-16-thread-1] BSHostStatusCollector:55 - Request directory /var/run/ambari-server/bootstrap/1

24 May 2017 15:22:33,903 INFO [pool-16-thread-1] BSHostStatusCollector:62 - HostList for polling on [machine1.domain.net]

24 May 2017 15:22:33,906 INFO [Thread-44] BSRunner:285 - Bootstrap output, log=/var/run/ambari-server/bootstrap/1/bootstrap.err /var/run/ambari-server/bootstrap/1/bootstrap.out at machine1.domain.net

24 May 2017 15:22:43,906 INFO [pool-16-thread-1] BSHostStatusCollector:55 - Request directory /var/run/ambari-server/bootstrap/1

24 May 2017 15:26:53,930 INFO [pool-16-thread-1] BSHostStatusCollector:62 - HostList for polling on [machine1.domain.net]

24 May 2017 15:27:34,092 WARN [Thread-44] BSRunner:292 - Bootstrap process timed out. It will be destroyed.

24 May 2017 15:27:34,093 INFO [Thread-44] BSRunner:309 - Script log Mesg INFO:root:BootStrapping hosts ['machine1.domain.net'] using /usr/lib/python2.6/site-packages/ambari_server cluster primary OS: redhat7 with user 'hdpuser'with ssh Port '22' sshKey File /var/run/ambari-server/bootstrap/1/sshKey password File null using tmp dir /var/run/ambari-server/bootstrap/1 ambari: machine1.domain.net; server_port: 8080; ambari version: 2.4.2.0; user_run_as: root INFO:root:Executing parallel bootstrap Bootstrap process timed out. It was destroyed.

1 ACCEPTED SOLUTION

avatar
Explorer

Finally solved!! The problem was that even if SSH was setup without password, sudo user ('hdpuser') was setup with password (This means for sudo commands, i was prompted for password).

I modified sudoers entry to make it password less on all cluster machines. That did it!

View solution in original post

11 REPLIES 11

avatar
Explorer

Can you check if you can SSH from the Ambari Server to the Agents?

You might have some DNS issues or it is not able to SSH

avatar
Explorer

I am able to SSH from ambari server (system 1) system to data node systems (system 2 and 3)

I have added all the three hosts in /etc/hosts of all the three systems.

"hostname -f" on all the three systems gives me FQDN

Just thinking could this be proxy server issue ? I have to bypass proxy in FireFox to access localhost.

I have done proxy=_none_ in .repo files to skip proxy for yum.

Not sure if I have to do the same for curl ?

avatar
Explorer

Can you install Ambari agent on those machines? Or any package from your repo

avatar
Explorer

Yes. I can manually install agents on all the three systems. (As I said, I have added noproxy to repo files. Without that it wasn't working)

avatar
Explorer

If I do "curl <localrepository>", I am getting DNS error. Just wondering if this is the cause of problem.

avatar
Explorer

I fixed proxy setting for curl, wget, ambari-server and yum. Still I am getting exactly the same problem.

avatar
Expert Contributor

@MB If you're getting a DNS error, that needs to be resolved either by configuring DNS for the hosts or by manually adding the host info to /etc/hosts on each node before you retry the cluster installation. Same goes for the repositories if you're using local repos.

avatar
Explorer

I fixed DNS & proxy settings. Getting exactly the same error. Bootstrap timing out.

avatar
Explorer

Finally solved!! The problem was that even if SSH was setup without password, sudo user ('hdpuser') was setup with password (This means for sudo commands, i was prompted for password).

I modified sudoers entry to make it password less on all cluster machines. That did it!